Virtually all significant Web applications have a need for a persistence service. Persistence is simply a means of preserving information from one session to another. It often implies some form of database, but this is not a given. The type and sophistication of persistence required will vary widely—some will need very simple persistence, and others will require immense scalability, failover, two-phase commit, and other distributed capabilities. In this article, we’ll briefly review persistence techniques, and examine a fairly new Java API designed to bring some uniformity to these techniques. We’ll also talk about what impact this has particularly on Web application development.
Back when dinosaurs and mainframes ruled the earth, the filesystem was the only game in town. Depending on the application, there may indeed be nothing wrong with using the filesystem for persistent storage today.
The need to efficiently retrieve specific data from large data files soon made various indexing methodologies popular, and more complex data relationships gave rise to hierarchical and network databases. These systems, although representing relationships between data well, do not lend themselves particularly well to ad-hoc queries, where the structure of the query is not necessarily known beforehand.
By the time of the introduction of the Java platform, the most common form of complex data storage for applications was the relational database management system (RDBMS). As a storage and retrieval mechanism for row-oriented data, the RDBMS is hard to beat. When you add the fact that thousands of tools, utilities, and applications are available for RDBMS systems, their enormous popularity is further explained. RDBMSes are also very good at handling ad-hoc queries, making them ideal for Web-based executive information systems where the user builds queries, as opposed to just executing pre-defined reports.
Although it is not native Java, the expressive power of SQL is substantial, and the query language portion is at least relatively standardized. An interface to access this query language was required, and the Java DataBase Connectivity API (JDBC) was created in response to this need. JDBC allowed Java applications to connect in a standard way to many different types of Java databases, and removed the need to have a separate API for each type of RDBMS you wanted to talk to. Each database still had its SQL peculiarities, however, and the data definition language (DDL) was even less standardized across databases. Applications still tended to end up supporting only one or at most a few types of database engines.
Java, though, no matter how well integrated with a relational database, is still an object-oriented language. There is always, therefore, a kind of impedance mismatch between the table-oriented design of a relational system and the internal object-oriented structure of a well-designed Java application.
The Object-oriented database (ODBMS) offers one solution to this problem, by essentially avoiding the problem entirely. ODBMSes store objects directly, and are in some respects more sophisticated cousins to older Network and Hierarchical databases, in that the relationships between data elements are stored explicitly. A Customer object might have a method called “getInvoices” that returns a Set of Invoice objects, for example. This makes ODBMS work very well with Java, but reintroduces the same set of issues that led to the development of the RDBMS to some degree: Query tools must be written to talk to each ODBMS, and changing data structures can sometimes be a problem once a database is established. ODBMSes have a niche where their advantages outweigh their limitations, however, and should be investigated as an option when building an application.
Another choice, perhaps numerically the most popular in Web applications, is the object-relational mapping layer. This is a framework or library that provides a means to “map” objects and relationships between objects to tables and columns in a relational database, then manages this mapping during the execution of the Java application. This frees the developer from dealing with SQL directly, thus avoiding the different “dialects” of SQL, and to a significant degree offers the advantages of an ODBMS while retaining the flexibility and ubiquity of relational database engines. This category includes projects such as the popular Hibernate, Apache’s Torque and Object-Relational Bridge, and many others, both open source and commercial. Each has their strengths and weaknesses, and must be evaluated in light of the need in the given situation.
Of course, any Java application, Web-based or otherwise, always has the choice to use a J2EE container that provides Entity bean handling. Entity beans, the persistence mechanism provided by the EJB standard, have come a long way since their initial introduction, when many developers considered them too overhead-intensive, especially for Web applications. They do (depending on the support of the container) offer the promise of high scalability, however, and are often the choice for very large development projects. Many tools make writing the several files required for each Entity EJB easier, reducing the burden on the developer. Often Entity EJBs will use JDBC as the underlying mechanism to communicate with the database.
There are many other choices besides databases, however, no matter what the type or the access method. Prevayler (www.prevayler.org), for example, is an intriguing example of a project that simply extends Java’s serialization mechanism with a logging capability, keeping all data in memory and simply writing a periodic snapshot to disk. The log provides protection from crashes, as it can be applied to the last snapshot to restore the system state. For Web-applications that use relatively small databases (e.g. Those that can fit entirely in memory), this is a very attractive approach, and of course its performance is hard to beat. Prevayler also offers no difficulty in Java-to-database mapping: there simply isn’t any, as the Java objects are stored directly. If your application fits this category, and perhaps if your developers are not as familiar with relational databases in any case, a tool such as Prevayler might be the answer.
One API to Rule Them All?
Just as the JDBC API brought a standard way to access relational databases, the Java Data Objects API, JDO, brings a standard means for accessing all data stores, relational or otherwise. It has the great advantage of being a very simple API, with none of the complexity or required overhead of Entity EJBs—it can, however, co-exist with Entity beans or with many other persistence techniques, as we will see.
JDO is a deceptively simple API: An application that requires access to persistent object acquires a reference to a PersistenceManagerFactory. This factory is in turn used to request PersistenceManager objects. External XML configuration files specify the classes that are to be capable of persistence—the class itself can be written just like any other JavaBean (in fact, it need not even follow the JavaBean standard method signatures necessarily, although this is usually a good idea in any case). All of the elements dealt with by the developer when interacting with JDO are defined as interfaces—the actual concrete implementation of those interfaces is left up to the provider of the JDO implementation. This means that JDO can be used as a “façade” to many existing data storage methods. There is a project underway now, for example, to provide a JDO-compliant API to the Hibernate project.
JDO also allows implementation-specific meta-data to be provided in its configuration. If dealing with a relational database as the underlying store, this might include things such as table names, index information, and so forth. For other types of storage, the meta-data would be different—but the application would not know or care about these differences.
In the diagram below, for example, you can see where the JDO API fits into the web application scenario we’ve been describing: the application logic may communicate with JDO directly, or with CMP EJBs, which in turn use JDO. The implementation of JDO in use at the time then determines which of the next blocks we will use—and these are not all of the choices, just some of the more common ones.
One thing JDO does not do, however, is provide low-level connection to the underlying data store. As this data store might be any of a number of different things, there is no way to practically achieve this. This concerns some developers who worry they will not be able to use the high-performance or unique features of their particular database platform of choice—even if they are willing to give up the portability that goes along with such a decision. Some JDO implementations, however, provide extensions to the base JDO API that do allow such access—and of course, JDBC is always there if you really need to get right to the database (in a relational environment, of course).
JDO provides full support for transactions, and the underlying implementation may provide for distributed database capabilities—combining a JDBC-based JDO implementation, such as TJDO, for example, and the clustering capabilities of the C-JDBC project can provide a powerful distributed, fail-safe database environment while still adhering to an established and well-documented Java API.
When working in a J2EE/EJB environment, the Entity EJB implementation might well use JDO as the underlying storage mechanism for container-managed persistence (CMP). The developers would then work exclusively with the Entity EJB API, and can remain unaware of JDO entirely in this scenario. Of course, for Web applications that have broad scalability requirements, being deployed in some situations on small servers with a low user load, and in other instances in a clustered, high-availability environment with very large user load, it may be a better choice to utilize JDO directly, allowing a very simple JDO implementation to be used on the low end, and a much more complex (and scalable) one for the high end. Again, the benefit here is that the application code remains entirely untouched.
Web applications, as we said before, may need many different levels of persistence. Having a single common API to work with means that the application itself need not change, even if the scalability requirements do. Using the JDO standard to communicate to whatever the underlying store mechanism is gives us this freedom, and allows our developers to concentrate entirely on Java. At the same time, what the JDO standard does not specify is almost as important as what it does. The initial Sun reference implementation, for example, uses the filesystem for the actual storage mechanism. More complex implementations use JDBC, and still others communicate with ODBMS systems. In every case, the application itself is unchanged—only the JDO implementation in use.
These advantages, and the existence of a standard for object persistence with such an easily adoptable API, make JDO an item worthy of further study for any Web application developer.
About the Author
Michael Nash is the president of JGlobal Limited, a software development, consulting, training and support company specializing in open source Java technologies. He is also a core developer of the Keel meta-framework, the author of two books and a number of articles and papers about next-generation web-application development with Java, and a member of the JSR-127 (JavaServer Faces) Expert Group. He can be reached at [email protected]