Java applications are often deployed in an environment where future expansion of the user base is anticipated. As a result, some care is taken in choosing an architecture and component model that can allow scalability to higher user loads, often through the use of additional servers, forming an application server cluster, sometimes also called “horizontal scalability”.
Recent industry surveys have indicated that Java and J2EE applications suffer from a larger percentage of downtime than is acceptable in an enterprise environment. Eliminating single points of failure can be a step towards improving both uptime percentages and scalability of an application, as we will see.
Recently, tools have become available in open source that make the process of building a complete solution capable of clustering, failover, and grid-style distributed processing practical and easy.
In this article we will examine one such tool, the Keel meta-framework (www.keelframework.org), and how it can be used as an enabler with several other open source technologies to build a completely scalable web application. Keel is a project that aims to not re-invent any wheels, but instead to provide the backbone to connect many different kinds of wheels together. It provides a generic component container, based on the powerful Apache Avalon framework, then layers on dozens of different services. Each service has several different available implementations. The business logic classes (called Models in Keel) see only the interface to the service – never the underlying implementation. This allows any given implementation to be “swapped out” much like a hard drive on a SCSI bus, without changing application logic at all. Database access, for example, can use the popular Hibernate (www.hibernate.org) frame!
work to begin with. Then, as needs dictate, perhaps CMP EJBs would be added – the application logic doesn’t know or care. In fact, both implementations can even be used at once. The same is true of the user-interface framework chosen: an application’s front-end could be written using Struts and its powerful tag library for JSP pages, then the exact same application, without a line of code changed, can be deployed with a front-end of Cocoon (or Velocity, WebWorks2, and many others), when and if the needs change. This unprecedented wealth of choice allows a project to change and evolve in a way that was never possible before, as changing environments and requirements demand. Application logic can be created in Keel using ordinary Java classes, Session EJB’s, message-driven EJBs, or web services, as needed. Then the deployment can be mixed and matched for the specific requirements of a given project. Keel can use any one of several different communication layers to enable compo!
nents and services to be distributed, including JMS and web services. In this article, we will examine how this ability to “mix and match” can allow you to build highly scalable and failure resistant web applications using 100% open source technologies while remaining “future proof” at the same time.
It’s always good to look before you leap, and given the uncertainties in today’s business world, it is important to allow for both the best case and worst case. An application can be deployed in a small, single-server environment and yet still be fully operational as a pilot project. If you were tied to a given implementation technology, however, you might find that the future (and so far, theoretical) needs for scalability have you doing more work than you need to in order to get an initial application running. Keel allows you to scale in *both* directions – up and down, as the circumstances require. Let’s see how a project might start off small, and grow into a truly scalable solution.
Keel provides a deployment option whereby the application server/container runs in the same virtual machine as the web application itself (although with an isolated classpath by means of a custom classloader), allowing the entire application to be deployed as a single .war file. The hardware needed to support this is trivial: easily available for a few thousand dollars. The entire solution (like all of our examples) can be built using freely available open source packages and tools.
Single Server Deployment
We’re using in our diagrams, specific examples of open source packages for each of the tiers – in actual fact, a lot of options exist at each of these levels. For example, we might choose Jetty instead of Tomcat, or JBoss instead of both Tomcat and OpenJMS (as JBoss has JBossMQ, an alternate JMS provider). We might select MySQL for our database, instead of PostgreSQL. We might use Web Services (Apache Axis is fully supported) as our communications layer, as opposed to JMS. In every case, our application logic doesn’t change.
In our initial diagram, as see a single server system, running one Java virtual machine. In this virtual machine we’re running Tomcat, the Struts framework for our UI interface, and an “embedded” Keel server/container. The Keel server is called by Struts to perform the application logic as needed, and the application logic uses the same container to get access to the services it requires to operate, such as database persistence, scheduling, email, event handling, workflow, reporting, and authentication and authorization, all via a single simple container interface. We write our application’s business logic by creating classes that implement the Model interface define by Keel, inheriting as needed in whatever way our application dictates.
On the same machine we might choose to run an instance of the PostgreSQL database engine to store our application’s data. We would build and deploy our simple .war file, and our application is up and running, self-contained.
Let’s say our little pilot project goes well, and user load starts to go up. Depending on the application, bottlenecks will begin to appear. We will be able to monitor and see where these bottlenecks are happening using the Instrumentation capability of Keel. This is a simple API provided by the underlying Avalon framework that allows any component to “report” on its load, its number of instances, and other vital statistics. A graphical interface allows us to connect to a running instance of the Keel container and monitor and chart these statistics. We can see, for example, that the database connection pool is getting heavily loaded, and that database calls are taking most of the processing time for our application. A simple first step towards scalability is then to break out the database to a separate server, as we see in our second diagram:
First Breakout: Separate Database Processing
This is usually a smart first step, as the system tuning and overall load on a server running a database is quite different than that of a system running a web server (or web application). This breakout allows us to tune the machine running the database for optimal performance, and will likely allow us to increase our overall user load substantially. The change is simple to make: a few configuration settings to let the persistence service know where the database is now located and we’re up and running again.
So far we’re still talking about a deployment where every node (every server and every process) is a single point of failure, however. Any one of these servers going down will take our entire application with it – in fact, increasing the number of servers in the example so far actually makes it slightly more fragile, as we’re expanding the number of potential points of failure by introducing more hardware and network connections.
Let’s continue a little further then, and see what happens if we increase the user load a bit more, and start to do a bit of thinking about reliability and failover. We’ll assume that breaking out the database to a separate server did part of the job for performance, but that now we’re looking at a bottleneck on processing the actual business logic – the Keel business-logic (Model) classes are taking a lot of time, and we need to add processing power.
Our next step would be to separate the web-application and the business
logic processing. The Keel server always interacts with the user-interface
framework (in this case, Struts) via a communications layer. This layer in a
single-VM deployment, which is what we’ve been using so far, is via a simple
in-memory queue channel. Several other options are available, though – including
JMS, which is what we’ll choose in our next example, shown below:
Second Breakout: Increase Business-logic Processing
In this deployment, we add one more component: An instance of the OpenJMS broker. We choose to run the broker component on the same system as the Keel server, with it’s application logic and services, but it can in fact be anywhere
on the network. We could have chosen several other communications layers
here, including web services, depending on the needs of the deployment.
It is purely a configuration-level change: no code is modified to switch
from one communications layer to another.
We can introduce a second business-logic layer server easily. As we see in our next diagram, the web application server now has the option to communicate with an OpenJMS broker on two different systems. (Again, JMS is just one way to do it).
It will use one normally, but fail over to the second one in the event it loses the connection to the first. The same is true of the business logic layer, the Keel Servers. In our diagram you will see that we are now running two Keel server processes on each of two separate machines.
Third Breakout: More Application Performance and Load Balancing
Each will respond to requests it finds on the JMS queue – if any one Keel server is busy handling requests, the next will pick up the request, and so forth. The queuing nature of JMS gives us a certain amount of built-in load balancing, without any additional effort, but we can go much further. Each Keel server also provides some failure protection – if we shut one of them down, or even both on one server, the other servers simply take up the request load, and no failure is seen by the users (just perhaps a slight increase in response time). We can tune our Keel servers to be aware of their own load factors – the percentage of time they are spending handling requests as a percentage of their total elapsed running time – and to “self-tune” as needed. A server that is getting very busy can be set to ignore new requests for a brief time, as opposed to spawning an ever-increasing number of new threads to handle additional simultaneous requests. This delay allows other servers!
listening to the same queue to pick up the slack. Let’s say that a given server receives a particularly complex request, that eats up a lot of processing time. Its load factor goes up, so it ignores new requests for a few seconds, allowing them to remain on the queue. Another server in the group picks up those new requests until the complex request is complete, the load factor drops, and the original server gets back in the game.
Keel applications are usually written with application logic (model) classes that are single-threaded (like a Stateless session bean). It is possible, however, that a user’s session may require state to be maintained – at the very least their login identity. Keel, through Avalon, provides the concept of a per-user Context, roughly the server-side equivalent of the servlet Session. A separate dedicated JMS queue is used for Keel server instances to communicate with each other, sharing any changes made to the per-user context. As a new server comes online, it requests an initial set of context information from the queue, and one of the running servers provides it. As long as at least one server is always running, context is never lost. This allows such conveniences as upgrading a running application, by stopping one Keel server at a time, then restarting it with a new configuration and set of class files. We inch closer to our fully clustered fail-safe system, but we’re not!
We still have two single points of failure, though: the front-end web server, running our Struts UI, and the PostgreSQL database server. What can we do about this? There are several solutions: First, we can replicate the PostgreSQL database, providing a “live” backup to the operational database that we can switch to quickly – although perhaps not automatically. This reduces the huge dependence on the single database server somewhat. For the web application server, we can use any of a number of load-balancing/failover systems to cluster its operation as well: the Balance project (www.inlab.de/balance.html) is one such solution, which will automatically distribute incoming requests among two or more web application servers. At this point we would also want to configure web application session sharing between the Tomcat servers, so that users that may get a different server from one request to the next will not lose any context being s!
tored at the web application layer.
Fourth Breakout: UI Tier – Failover and Load Balancing
Now we’ve eliminated the single point of failure (although we will likely want at least two Balance servers, so that we don’t simply move the single point of failure from Tomcat to Balance). Note that Balance might in fact be running on the same physical system as Tomcat and Struts – we show them separately just to illustrate the architecture more clearly. There are also, of course, other options to provide the IP address load sharing, including dedicated hardware solutions – these would work equally well in this scenario. Although the diagram shows an increasing number of communication
lines between serves, not all of these are active at once, and network traffic is not excessive between the servers.
Each UI connector is configured to be aware of all available JMS brokers, and to fail over from one to the
next as needed – at any given moment, only one JMS broker is active.
We’ve still got that nagging single point of failure of the database itself. We can replicate, so that we have effectively a “hot backup” ready to go in the even the primary fails, but this does little to increase performance, and is still not a fail-safe solution.
Fortunately, yet another open source project brings an answer to fill this need. Most Java applications use the JDBC API to communicate with their database(s), via a JDBC driver especially written for the database they choose (in our example, PostgreSQL). A new project, called C-JDBC (c-jdbc.objectweb.org), provides a JDBC driver that does not talk to a single database: instead, it talks to a heterogeneous collection of databases, making them appear to the application on the other end as a single large database. Database writes are distributed across the entire group. Database reads, however, are load-balanced to a single database in the group. This does two things for us: it gives us the ability to eliminate the single point of failure of the database. If C-JDBC cannot reach any one database, it is simply marked as offline and the others in the group used instead. It also allows us to reap the performance and scalability benefits of clu!
stering at the database level, especially in read operations. Most applications spend more of their time reading the database than writing to it, so this can be a large boost in such situations. Database failover and clustering like this has been available for a few of the higher-end commercial databases for some time, but usually at a high cost, and always with the requirement that all of the clustered databases be of the same type. C-JDBC has the unique advantage that even different database engines can be mixed in the cluster. A few MySQL servers, a few PostgreSQL and a Hypersonic: all can be combined into a single powerful cluster.
Fifth Breakout: Eliminate Single Point of Failure
As you can see in our last diagram, bringing C-JDBC into the mix gives us a fairly resilient overall architecture. Let’s examine the possibilities: if a web application server fails, Balance will simply shift the load to the remaining servers (each of which has a copy of any user context needed). If an application server (e.g. a Keel server instance) fails, the load again shifts seamlessly to the remaining servers in the cluster, and again the current Context is shared and available to the other servers. With C-JDBC, even a database failure is not a show-stopper: the database that does not respond simply gets “skipped” temporarily, and can be brought back up-to-date when it is back in operation.
The old saying about a fool proof system no doubt holds true: a more inventive fool will come along to find a way to break it. At the same time, it is now possible, using only open source technologies, to provide a highly failure resistant application that can start small and simple, and grow as quickly as demand dictates and in the way that makes best sense for the specifics of the application being run.
About the Author
Michael Nash is the president of JGlobal Limited, a software development, consulting, training and support company specializing in open source Java technologies. He is also a core developer of the Keel meta-framework, the author of two books and a number of articles and papers about next-generation web-application development with Java, and a member of the JSR-127 (JavaServer Faces) Expert Group. He can be reached at firstname.lastname@example.org