Web-Application Persistence: JDO & More
Virtually all significant Web applications have a need for a persistence service. Persistence is simply a means of preserving information from one session to another. It often implies some form of database, but this is not a given. The type and sophistication of persistence required will vary widely—some will need very simple persistence, and others will require immense scalability, failover, two-phase commit, and other distributed capabilities. In this article, we'll briefly review persistence techniques, and examine a fairly new Java API designed to bring some uniformity to these techniques. We'll also talk about what impact this has particularly on Web application development.
Back when dinosaurs and mainframes ruled the earth, the filesystem was the only game in town. Depending on the application, there may indeed be nothing wrong with using the filesystem for persistent storage today.
The need to efficiently retrieve specific data from large data files soon made various indexing methodologies popular, and more complex data relationships gave rise to hierarchical and network databases. These systems, although representing relationships between data well, do not lend themselves particularly well to ad-hoc queries, where the structure of the query is not necessarily known beforehand.
By the time of the introduction of the Java platform, the most common form of complex data storage for applications was the relational database management system (RDBMS). As a storage and retrieval mechanism for row-oriented data, the RDBMS is hard to beat. When you add the fact that thousands of tools, utilities, and applications are available for RDBMS systems, their enormous popularity is further explained. RDBMSes are also very good at handling ad-hoc queries, making them ideal for Web-based executive information systems where the user builds queries, as opposed to just executing pre-defined reports.
Although it is not native Java, the expressive power of SQL is substantial, and the query language portion is at least relatively standardized. An interface to access this query language was required, and the Java DataBase Connectivity API (JDBC) was created in response to this need. JDBC allowed Java applications to connect in a standard way to many different types of Java databases, and removed the need to have a separate API for each type of RDBMS you wanted to talk to. Each database still had its SQL peculiarities, however, and the data definition language (DDL) was even less standardized across databases. Applications still tended to end up supporting only one or at most a few types of database engines.
Java, though, no matter how well integrated with a relational database, is still an object-oriented language. There is always, therefore, a kind of impedance mismatch between the table-oriented design of a relational system and the internal object-oriented structure of a well-designed Java application.
The Object-oriented database (ODBMS) offers one solution to this problem, by essentially avoiding the problem entirely. ODBMSes store objects directly, and are in some respects more sophisticated cousins to older Network and Hierarchical databases, in that the relationships between data elements are stored explicitly. A Customer object might have a method called "getInvoices" that returns a Set of Invoice objects, for example. This makes ODBMS work very well with Java, but reintroduces the same set of issues that led to the development of the RDBMS to some degree: Query tools must be written to talk to each ODBMS, and changing data structures can sometimes be a problem once a database is established. ODBMSes have a niche where their advantages outweigh their limitations, however, and should be investigated as an option when building an application.
Another choice, perhaps numerically the most popular in Web applications, is the object-relational mapping layer. This is a framework or library that provides a means to "map" objects and relationships between objects to tables and columns in a relational database, then manages this mapping during the execution of the Java application. This frees the developer from dealing with SQL directly, thus avoiding the different "dialects" of SQL, and to a significant degree offers the advantages of an ODBMS while retaining the flexibility and ubiquity of relational database engines. This category includes projects such as the popular Hibernate, Apache's Torque and Object-Relational Bridge, and many others, both open source and commercial. Each has their strengths and weaknesses, and must be evaluated in light of the need in the given situation.
Of course, any Java application, Web-based or otherwise, always has the choice to use a J2EE container that provides Entity bean handling. Entity beans, the persistence mechanism provided by the EJB standard, have come a long way since their initial introduction, when many developers considered them too overhead-intensive, especially for Web applications. They do (depending on the support of the container) offer the promise of high scalability, however, and are often the choice for very large development projects. Many tools make writing the several files required for each Entity EJB easier, reducing the burden on the developer. Often Entity EJBs will use JDBC as the underlying mechanism to communicate with the database.
There are many other choices besides databases, however, no matter what the type or the access method. Prevayler (www.prevayler.org), for example, is an intriguing example of a project that simply extends Java's serialization mechanism with a logging capability, keeping all data in memory and simply writing a periodic snapshot to disk. The log provides protection from crashes, as it can be applied to the last snapshot to restore the system state. For Web-applications that use relatively small databases (e.g. Those that can fit entirely in memory), this is a very attractive approach, and of course its performance is hard to beat. Prevayler also offers no difficulty in Java-to-database mapping: there simply isn't any, as the Java objects are stored directly. If your application fits this category, and perhaps if your developers are not as familiar with relational databases in any case, a tool such as Prevayler might be the answer.