Programming almost always involves data manipulation. There are two main factors involved: data access and data manipulations. That is you first have to figure out a way to access the data and then change it. Sometimes, the way you change the data depends on how it is accessed. Real-world examples of the above principles are SAX and DOM. Taking advantage of the inherent structure of an XML document, these methods allow programs (regardless of the language) to access XML data and make changes to it. The key is that unlike typical data formats like CSV, XML has a self-validation component. We agreed that unless the document is well-formed or it complies with a given DTD, we should not process it further. By doing that, we have created a rule or a constraint that every XML document has to follow. So before the data makes it to the program, there is a presumption that it complies with certain rules.
Certainly, XML is not the only data format and Java is not the first language that had to deal with external data. Relational databases, for years, had stored data and provided means by which programs could access that data in terms of SQL. When you looked at some of these early programs, you could clearly see the two factors that I mentioned earlier. There was a piece of program that used JDBC, ODBC or some other mechanism to access the data. Then there was another piece that manipulated that data based on the program logic. Data binding was an effort to make this two-step dance into one. If we could map the structure of the data into a program construct (e.g., a class) and then bind the data to this structure, then the program really didn’t have to know anything about the data source. Of course, this is not a new concept and in the context of relational databases, data binding has been happening for years. More recently, object mapping tools provide a direct mapping of relational data to objects that are treated like any other object by the application. The underlying mapping/binding code is hidden from the application.
With XML, it took a while, but it now appears that there are serious efforts going on to do a seamless binding between XML documents and classes (although this concept is really language independent, I have seen a lot of efforts around Java). Two things are necessary to make this successful. First, you need to have a way to describe the data. SQL had well-understood ways to define a table and the columns/data types that go in it. DTDs did the same for XML, but not very well. DTDs really didn’t have a concept of a data type and to make data binding work, you need to be able to differentiate between different data types because the programming language does so. So with XML Schema, we have now a standard way for describing the data (XML). The other piece you need is a mechanism to map that description into a Java class. This class becomes the basis for data binding.
There is one aspect of Java that is helpful in this process. Remember my earlier point about XML data having certain constraints simply because it is XML? We can further those constraints via DTDs and Schemas. Brett McLaughlin, nicely correlates such constraints to Java interfaces. An interface, in effect, puts a straightjacket around the classes that implement it. Zeus (zeus.enhydra.org) is an effort to implement a two-way data binding mechanism between Java an d XML. It takes the data binding a step further by separating the way we describe XML data (e.g., DTD, Schema, etc.) from the mapping mechanism. This would allow data binding independent of the underlying constraint.
The odd conclusion from this is interesting. If the Java application could treat XML data as just an application object, then would XML just fade away? Would this be another example of a data format becoming consumed by the programming language? I don’t think so and I’ll tell you why in a future column, but it would be interesting to see how Java code that access XML data is written a year or so from now.