http://www.developer.com/

Back to article

Caching Solutions in Java


September 20, 2007

Introduction

Data caching is a very important consideration for J2EE applications. Data caching limits the number of remote invocations in distributed applications and improves performance of web applications by reducing the number of calls to the persistent data stores. Even though caching improves performance and makes your architecture work, it can, in fact, complicate design and introduce such complexities as concurrent code and cluster-wide synchronization.

Once it has been decided that data caching is an integral part of the architecture, choosing the right caching solution can prove to be difficult. There is always an option to implement a caching solution from scratch. This approach can have its advantages, but will inevitably affect the project's cost and timeline. Another solution is to choose one of the open-source caching products. When choosing a caching solution, the following questions should be considered:

  1. Does caching solution provide easy integration with an ORM product?
    It should be easy to integrate the caching product with some of the popular ORM products such as Hibernate or Toplink. The domain objects are POJOS map to RDBMS entities and cached in memory, thereby reducing network traffic to the RDBMS.
  2. Does caching solution provide presentation layer caching?
    The cache product should provide HTTP response/JSP caching on the presentation layer.
  3. Does caching solution allow storage of objects in memory and disk?
    In case the memory capacity is full, the cache product should evict objects to a local disk.
  4. Is it easy to use?
    A cache product should expose minimum API for the client to use.
  5. Does it support distributed cache?
    A cache within each JVM needs to be coordinated in a clustered environment.
  6. Does it allow sharing of objects with a JVM?
    All the application threads within a JVM should be able to access the same instance of an object in a cache.
  7. Is cache invalidation supported?
    The caching product should provide a facility to invalidate a cache or a cache group. Cache invalidation should be coordinated in a distributed cache.
  8. What is the availability level?
    The cache maintains a local copy; some operations can continue even if the original source is unavailable.
  9. Is it scaleable?
    In a distributed cache, multiple copies of a cache are available across processes. Thus, the scalability of the server is improved.
  10. Is it easy to maintain?
    The cache product should have proper logging facilities in order to debug the code.
  11. Does it adherence to standards?
    JCache is the standard; in other words, the JSR 107 caching service was born out of the JCP process. If the cache supports the standard, a client can have unified access to the cache.

Available Open-Source Solutions

Against the backdrop of the requirements mentioned above, you will evaluate the various products that cache Java objects. The most important features of the various products are mentioned below.

1. OSCache

1.1 Http Response caching

This feature is useful when dealing with static HTML pages. The Page response can be cached indefinitely in memory thus avoiding reprocessing of the page. OSCache uses the URI and query parameters to form a unique key. This key is used to store page content. HttpResponse caching is implemented as a ServletFilter. Thus, the cache filter abstracts the API usage from the client. The configuration of the cache filter is done in web.xml. By default, the Cache Filter holds the page response in 'Application' scope and refreshes the cache every one hour. These default values can be changed.

1.2 JSP Tag library caching

In case of dynamic pages (JSPs), OSCache provides tags that surround the static part in the page. Thus, only the static part of the page is cached.

1.3 Data Access Layer caching

All ORM tools map RDBMS entities to domain objects. OSCache can be used to cache the domain objects returned by the ORM tool. This drastically reduces the number of network trips to the DBMS server and expense associated with object creation. Most ORM tools have a pluggable architecture for caching; in other words, OSCache can be plugged into any ORM tool. The ORM tool manages the caching of domain objects for the client.

OSCache can be configured for persistence cache. When the memory capacity is reached, objects are evicted from the memory and stored on a hard disk. Objects are evicted from memory based on the configured cache algorithm. However, caution should be exercised when dealing with the hard disk cache.

Out-of-the box OSCache comes with LRU (Least recently used) and FIFO (First In First Out) algorithms. Any of the two algorithms can be configured with OSCache. However, any third-party algorithm can be configured with OSCache.

The cache API is relatively easy to use. An instance of 'GeneralCacheAdministrator' is created and the cache administrator is used to add, update, and flush entries in the cache.

OSCache supports distributed caching. When an application is deployed in a cluster of application servers, the local cache is kept in sync by communication amongst all the caches in the cluster. However, OSCache doesn't provide sophisticated support for state management in a cluster.

OSCache doesn't confirm to the JSR 107 standard.

2. EHCache

2.1 HttpResponse and Page Fragment Caching

EHCache provides 'SimplePageCachingFilter' for caching static pages. SimplePageCachingFilter also gzips the HttpResponse to the browser and the browser unzips the response and shows it to the client. For dynamic pages such as JSPs, EHCache provides 'SimplePageFragmentCachingFilter' to cache the static art in the JSP. However, it doesn't provide any taglib like OSCache for page fragment cache; Page Fragment cache is view agnostic.

2.2 Data Access Layer Caching

EHCache provides a feature to cache domain objects that map to database entities. In fact, EHCache is the default cache for Hibernate. EHCache provides support for memory and disk stores. EHCache provides LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out) algorithms out-of-the box algorithms for object eviction from memory. EHCache offers support for distributed caching. The default implementation supports cache discovery via multicast or manual configuration. Updates are delivered either asynchronously or synchronously via custom RMI connections. Additional discovery or delivery schemes can be plugged in by third parties.

The EHCache API is very simple and easy to use.

An important feature of EHCache is that it is JMX enabled. The following can be monitored:

  • CacheManager
  • Cache
  • CacheConfiguration
  • CacheStatistics

EHCache offers the most complete implementation of JSR107 JCACHE to date.

3. Jofti

Some of the salient features of Jofti are:

  1. Jofti differs from the other cache implementations in that it provides an indexing mechanism. Hence, to retrieve an Object from the cache, it is queried on one of its properties rather than some arbitrary key.
  2. The indexing mechanism provided by Jofti works fine with frequent updates to it.
  3. Jofti provides an easy-to-use programming interface.
  4. It doesn't provide a cache implementation on its own. However, any cache implementation can be plugged into the indexing mechanism.
  5. The following cache implementations are currently supported by Jofti:
    • EHCache
    • OSCache
    • JBoss Cache
    • Tangosol Coherence

Thus, the above caching products get plugged into Jofti's indexing mechanism. The caching features provided by individual caches are available for use with Jofti. Jofti indexing mechanism can run across multiple caches. The caches can be a local cache or a clustered cache.

Because Jofti doesn't have its own cache implementation, it depends on the implementations of the plugged product.

3.1 Querying Jofti's Indexing Mechanism

Jofti's provides an exhaustive way to query its index for an Object apart from the traditional querying based on a 'key'. Jofti supports EJB-QL and SQL (ANSI).

3.2 Configuration

Because Jofti is itself doesn't have a caching mechanism on its own, it needs to be configured to use the plugged-in cache product. Jofti can be configured in two ways:

  • As a wrapper: In this model, Jofti acts as a adapter for the client. Jofti provides its own API that routes the call to the underneath cache that is plugged into Jofti.
  • As a Listener: In this model, the client uses the cache's API for caching. Jofti's index implementation acts as a listener. The index is updated whenever cache's API is invoked.

Jofti provides Transaction Support through JTA implementation. If the cache doesn't support transactional updates, Jofti uses the 'javax.transaction.Synchronization' interface.

4. ShiftOne

  • ShiftOne is a lightweight caching framework.
  • ShiftOne comes up with a set of cache algorithm implementations namely LRU (Least Recently Used), LFU (Least Frequently Used) and FIFO (First In First Out) algorithms. These algorithms are referred to as 'policy caches'. It provides a means of configuring a third-party caching mechanism into ShiftOne.
  • ShiftOne provides a set of decorators that eventually use the underneath caching product to maintain the cache.

The following caching products can be plugged into ShiftOne:

  1. EHCache
  2. SwarmCache
  3. JCS Cache
  4. Oro Cache
  • ShiftOne supports JMX implementation for collecting statistics.
  • Like most of its counterparts, ShiftOne provides an easy programming interface.
  • It doesn't support a transactional cache.
  • ShiftOne supports a distributed cache.

5. WhirlyCache

WhirlyCache provides an in-memory cache. WhirlyCache runs a separate thread to prune the cache; in other words, the data from the cache is not provided by the same application thread that the client uses. Thus, there are fewer burdens on the application thread. WhirlyCache caches all of its data in memory. The background thread is called the tuner thread. There is one tuner thread per cache. The tuner thread can be configured to run after every n seconds. It tries to use the JVM heap to the fullest—the capacity of the cache memory has to be set properly. WhirlyCache doesn't provide a disk overflow feature. To address a memory problem, WhirlyCache uses soft references. The tuner thread scans through its cache to remove any unused references. Underneath, Whirly cache uses FastHashMap and ConcurrentHashMap to maintain soft references to objects that are stored.

5.1 Presentation Layer Caching

WhirlyCache doesn't provide any JSP tag library or PageFragmentFilter to cache part of a JSP. It doesn't provide any HttpResponse cache on the presentation layer.

5.2 Data Layer Caching

WhirlyCache provides caching of domain objects at the Data Access layer. WhirlyCache can be plugged into Hibernate, a popular ORM framework.

Configuration of WhirlyCache can be done in an XML file named 'whirlycache.xml' that comes with a set of default values.

WhirlyCache provides programmatic API to access its cache.

6. SwarmCache

SwarmCache is an in-memory cache intended more for caching domain objects on the data access layer. It offers support for a distributed cache in a clustered environment. When an update/delete happens to a domain object in the cache, the cache manager of the affected cache will communicate to all the other managers in cluster to update their cache. As the number of updates to a cache in the cluster increases, the performance takes a hit. Every update to a cache will result in all the cache managers updating their local cache.

6.1 Presentation layer

SwarmCache has no support for HttpResponse caching. It also doesn't support caching of a page fragment in a dynamic JSP page.

6.2 Data Access layer caching

As mentioned above, SwarmCache supports caching of domain objects in this layer.

There is no mention of SwarmCache getting plugged into any of the popular ORM tools. Hence, it is assumed that caching in the data access layer needs to be done specifically using SwarmCache's API.

SwarmCache supports the LRU caching algorithm. However, SwarmCache is essentially an in-memory cache. When LRU is set as the caching algorithm and the memory capacity is reached, SwarmCache evicts the memory objects as per LRU logic from its memory.

SwarmCache uses soft references to the cached objects. So, if the LRU is not set as the caching algorithm, it relies on the garbage collector to swipe through its memory and clean objects that are least frequently accessed. However, SwarmCache recommends a combination of the above two to be set as the caching algorithm.

It provides API for clearing local cache.

7. Java Caching System (JCS)

  • JCS is a cache that supports caching data in memory, or a disk on a remote server using RMI. JCS is more suitable for caching data on the Data Access layer.
  • JCS doesn't support caching of HttpResponse and Page Fragment caching on the presentation layer.
  • JCS supports distributed cache. All updates and invalidations to the local cache are broadcast to all the caches involved in the cluster. Hence, it can be inferred that JCS is more suitable for applications that have frequent reads and infrequent updates.
  • The JCS cache area can be in memory, indexed disk space, remote cache, and lateral cache. A combination of caches also can be configured. If the area in memory is full, objects are evicted to disk.
  • In JCS, the data in the disk is indexed to facilitate easy retrieval from disk. A remote cache is more suitable when you have multiple web server JVMs running on the same node.
  • The configurations of JCS are set in a properties file named config.ccf file.
  • JCS provides API for accessing its cache from a Java class.

8. Cache4j

  • Cache4j is a cache for Java objects that stores objects only in memory. It is mainly useful for caching POJO objects on the data access layer.
  • It supports LRU (Least Recently Used), LFU (Least Frequently Used), and FIFO (First In First Out) caching algorithms. For storing objects in its cache, cache4j offers hard and soft references. Cache4j is implemented in a way that multiple application threads can access the cache simultaneously.
  • Cache4j provides easy to use programming APIs.

9. JBossCache

JBoss offers two kinds of cache flavors, namely 'TreeCache' and 'PojoCache'. Look at each of the two models in detail.

9.1 TreeCache

JBoss Tree Cache is a cache to store POJO (Plain Old Java Objects). However, every object stored in the cache has to be a serialized java object; in other words, the object has to implement the java.io.serializable interface in a distributed tree cache. The cache is structured as tree—each node in the tree is a map.

The tree cache can be local or distributed.

If a change is done to a field in a POJO, the tree cache serializes the entire POJO. This can be expensive if the object size is a huge one.

JBoss Tree cache offers caching in memory. If the memory reaches a limit, objects are passivated to a disk.

Tree cache is JMX-enabled; it can provide statistics regarding the Cache to a MBeans server.

It provides hooks for client code to attach listeners when a cache event occurs.

Tree cache is transactional in nature. So, any updates/invalidations to an object in the cache are replicated to all the trees in the cache only after the transaction successfully commits. In case the transaction fails, no communication happens amongst the tree caches.

JBoss Tree cache can be plugged into any of the popular Application Servers: IBM WebSphere, Bea Weblogic, Hibernate ORM tool, and so forth. It can be used in a standalone Java application that isn't run in the context of an Application server.

Note: JBoss Tree cache is more for caching domain objects and doesn't support HttpResponse caching and Fragment caching in the case of a dynamic page like a JSP.

9.2 POJO cache

At the outset, POJO cache differs from TreeCache in that the objects in the cache needn't implement the java.io.Serializable interface.

POJO cache supports fine grained replications; only the changes made to the POJO are serialized. Also, any changes to the object are replicated across the cluster automatically; there needn't be any API call to do this action.

POJO cache too supports object eviction to a disk. If the memory gets full, the objects are passed to a disk.

POJO cache is transactional in nature and supports a distributed cache.

As in Tree Cache, the POJO cache tool is more for storing domain objects on the data access layer.

POJO cache can be used in the context of an Application Server and in a standalone java application too.

6.2 Data Access layer caching

As mentioned above, SwarmCache supports caching of domain objects in this layer.

There is no mention of SwarmCache getting plugged into any of the popular ORM tools. Hence, it is assumed that caching in the data access layer needs to be done specifically using SwarmCache's API.

SwarmCache supports the LRU caching algorithm. However, SwarmCache is essentially an in-memory cache. When LRU is set as the caching algorithm and the memory capacity is reached, SwarmCache evicts the memory objects as per LRU logic from its memory.

SwarmCache uses soft references to the cached objects. So, if the LRU is not set as the caching algorithm, it relies on the garbage collector to swipe through its memory and clean objects that are least frequently accessed. However, SwarmCache recommends a combination of the above two to be set as the caching algorithm.

It provides API for clearing local cache.

7. Java Caching System (JCS)

  • JCS is a cache that supports caching data in memory, or a disk on a remote server using RMI. JCS is more suitable for caching data on the Data Access layer.
  • JCS doesn't support caching of HttpResponse and Page Fragment caching on the presentation layer.
  • JCS supports distributed cache. All updates and invalidations to the local cache are broadcast to all the caches involved in the cluster. Hence, it can be inferred that JCS is more suitable for applications that have frequent reads and infrequent updates.
  • The JCS cache area can be in memory, indexed disk space, remote cache, and lateral cache. A combination of caches also can be configured. If the area in memory is full, objects are evicted to disk.
  • In JCS, the data in the disk is indexed to facilitate easy retrieval from disk. A remote cache is more suitable when you have multiple web server JVMs running on the same node.
  • The configurations of JCS are set in a properties file named config.ccf file.
  • JCS provides API for accessing its cache from a Java class.

8. Cache4j

  • Cache4j is a cache for Java objects that stores objects only in memory. It is mainly useful for caching POJO objects on the data access layer.
  • It supports LRU (Least Recently Used), LFU (Least Frequently Used), and FIFO (First In First Out) caching algorithms. For storing objects in its cache, cache4j offers hard and soft references. Cache4j is implemented in a way that multiple application threads can access the cache simultaneously.
  • Cache4j provides easy to use programming APIs.

9. JBossCache

JBoss offers two kinds of cache flavors, namely 'TreeCache' and 'PojoCache'. Look at each of the two models in detail.

9.1 TreeCache

JBoss Tree Cache is a cache to store POJO (Plain Old Java Objects). However, every object stored in the cache has to be a serialized java object; in other words, the object has to implement the java.io.serializable interface in a distributed tree cache. The cache is structured as tree—each node in the tree is a map.

The tree cache can be local or distributed.

If a change is done to a field in a POJO, the tree cache serializes the entire POJO. This can be expensive if the object size is a huge one.

JBoss Tree cache offers caching in memory. If the memory reaches a limit, objects are passivated to a disk.

Tree cache is JMX-enabled; it can provide statistics regarding the Cache to a MBeans server.

It provides hooks for client code to attach listeners when a cache event occurs.

Tree cache is transactional in nature. So, any updates/invalidations to an object in the cache are replicated to all the trees in the cache only after the transaction successfully commits. In case the transaction fails, no communication happens amongst the tree caches.

JBoss Tree cache can be plugged into any of the popular Application Servers: IBM WebSphere, Bea Weblogic, Hibernate ORM tool, and so forth. It can be used in a standalone Java application that isn't run in the context of an Application server.

Note: JBoss Tree cache is more for caching domain objects and doesn't support HttpResponse caching and Fragment caching in the case of a dynamic page like a JSP.

9.2 POJO cache

At the outset, POJO cache differs from TreeCache in that the objects in the cache needn't implement the java.io.Serializable interface.

POJO cache supports fine grained replications; only the changes made to the POJO are serialized. Also, any changes to the object are replicated across the cluster automatically; there needn't be any API call to do this action.

POJO cache too supports object eviction to a disk. If the memory gets full, the objects are passed to a disk.

POJO cache is transactional in nature and supports a distributed cache.

As in Tree Cache, the POJO cache tool is more for storing domain objects on the data access layer.

POJO cache can be used in the context of an Application Server and in a standalone java application too.

10. Open Terracotta

Terracotta works on hub-spoke architecture. Terracotta is used in a clustered environment.

Each application server in the cluster acts as a client node (spoke). Terracotta libraries are installed in the JVM is loaded when each JVM in the cluster is brought up.

Then, there is the terracotta server which acts as the hub. It can be backed up by another terracotta server.

The terracotta server is implemented in Java.

The terracotta server stores objects that are evicted out of each client nodes when the client nodes run low on memory. If the server runs low on memory, objects are passed to disk space on the server.

Terracotta supports distributed cache. It also supports locks on objects in a distributed cache.

Fine grained replications: Terracotta doesn't require the objects in its cache to be serialized and replicates only changes done to the cache across the cluster.

Terracotta provides a view of the statistics of the memory heap of the client/server nodes in the cluster.

Terracotta supports:

  • HttpSession Replication
  • Distributed Cache
  • POJO clustering

Comparisons of the various Cache Solutions

Every application is different, but consider a cache product for the following application:

  1. UI intensive web-application
  2. The backend data store is RDBMS
  3. The web-app is deployed in a clustered environment

Conclusion

Looking at the comparison sheet, the following two options are recommended:

  • Jofti
  • JBossCache

For a traditional web application that doesn't require HttpSession replication and has traditional serialized objects as data carriers, Jofti appears to be a good solution.

However, for a very high-end application where importance is given to a sophisticated distributed cache with vendor support, JBossCache fits the bill

References

About the Author

Aleksey Shevchenko has been working with object-oriented languages for over seven years. For the past four years, he has served as a technical lead and a project manager. Aleksey has been implementing Enterprise IT Solutions for Wall Street and the manufacturing and publishing industries.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date