http://www.developer.com/open/article.php/3700661/Caching-Solutions-in-Java.htm
Data caching is a very important consideration for J2EE applications. Data caching limits the number of remote invocations in distributed applications and improves performance of web applications by reducing the number of calls to the persistent data stores. Even though caching improves performance and makes your architecture work, it can, in fact, complicate design and introduce such complexities as concurrent code and cluster-wide synchronization. Once it has been decided that data caching is an integral part of the architecture, choosing the right caching solution can prove to be difficult. There is always an option to implement a caching solution from scratch. This approach can have its advantages, but will inevitably affect the project's cost and timeline. Another solution is to choose one of the open-source caching products. When choosing a caching solution, the following questions should be considered: Against the backdrop of the requirements mentioned above, you will evaluate the various products that cache Java objects. The most important features of the various products are mentioned below. This feature is useful when dealing with static HTML pages. The Page response can be cached indefinitely in memory thus avoiding reprocessing of the page. OSCache uses the URI and query parameters to form a unique key. This key is used to store page content. HttpResponse caching is implemented as a ServletFilter. Thus, the cache filter abstracts the API usage from the client. The configuration of the cache filter is done in web.xml. By default, the Cache Filter holds the page response in 'Application' scope and refreshes the cache every one hour. These default values can be changed. In case of dynamic pages (JSPs), OSCache provides tags that surround the static part in the page. Thus, only the static part of the page is cached. All ORM tools map RDBMS entities to domain objects. OSCache can be used to cache the domain objects returned by the ORM tool. This drastically reduces the number of network trips to the DBMS server and expense associated with object creation. Most ORM tools have a pluggable architecture for caching; in other words, OSCache can be plugged into any ORM tool. The ORM tool manages the caching of domain objects for the client. OSCache can be configured for persistence cache. When the memory capacity is reached, objects are evicted from the memory and stored on a hard disk. Objects are evicted from memory based on the configured cache algorithm. However, caution should be exercised when dealing with the hard disk cache. Out-of-the box OSCache comes with LRU (Least recently used) and FIFO (First In First Out) algorithms. Any of the two algorithms can be configured with OSCache. However, any third-party algorithm can be configured with OSCache. The cache API is relatively easy to use. An instance of 'GeneralCacheAdministrator' is created and the cache administrator is used to add, update, and flush entries in the cache. OSCache supports distributed caching. When an application is deployed in a cluster of application servers, the local cache is kept in sync by communication amongst all the caches in the cluster. However, OSCache doesn't provide sophisticated support for state management in a cluster. OSCache doesn't confirm to the JSR 107 standard. EHCache provides 'SimplePageCachingFilter' for caching static pages. SimplePageCachingFilter also gzips the HttpResponse to the browser and the browser unzips the response and shows it to the client. For dynamic pages such as JSPs, EHCache provides 'SimplePageFragmentCachingFilter' to cache the static art in the JSP. However, it doesn't provide any taglib like OSCache for page fragment cache; Page Fragment cache is view agnostic. EHCache provides a feature to cache domain objects that map to database entities. In fact, EHCache is the default cache for Hibernate. EHCache provides support for memory and disk stores. EHCache provides LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out) algorithms out-of-the box algorithms for object eviction from memory. EHCache offers support for distributed caching. The default implementation supports cache discovery via multicast or manual configuration. Updates are delivered either asynchronously or synchronously via custom RMI connections. Additional discovery or delivery schemes can be plugged in by third parties. The EHCache API is very simple and easy to use. An important feature of EHCache is that it is JMX enabled. The following can be monitored: EHCache offers the most complete implementation of JSR107 JCACHE to date. Some of the salient features of Jofti are: Thus, the above caching products get plugged into Jofti's indexing mechanism. The caching features provided by individual caches are available for use with Jofti. Jofti indexing mechanism can run across multiple caches. The caches can be a local cache or a clustered cache. Because Jofti doesn't have its own cache implementation, it depends on the implementations of the plugged product. Jofti's provides an exhaustive way to query its index for an Object apart from the traditional querying based on a 'key'. Jofti supports EJB-QL and SQL (ANSI). Because Jofti is itself doesn't have a caching mechanism on its own, it needs to be configured to use the plugged-in cache product. Jofti can be configured in two ways: Jofti provides Transaction Support through JTA implementation. If the cache doesn't support transactional updates, Jofti uses the 'javax.transaction.Synchronization' interface. The following caching products can be plugged into ShiftOne: WhirlyCache provides an in-memory cache. WhirlyCache runs a separate thread to prune the cache; in other words, the data from the cache is not provided by the same application thread that the client uses. Thus, there are fewer burdens on the application thread. WhirlyCache caches all of its data in memory. The background thread is called the tuner thread. There is one tuner thread per cache. The tuner thread can be configured to run after every n seconds. It tries to use the JVM heap to the fullest—the capacity of the cache memory has to be set properly. WhirlyCache doesn't provide a disk overflow feature. To address a memory problem, WhirlyCache uses soft references. The tuner thread scans through its cache to remove any unused references. Underneath, Whirly cache uses FastHashMap and ConcurrentHashMap to maintain soft references to objects that are stored. WhirlyCache doesn't provide any JSP tag library or PageFragmentFilter to cache part of a JSP. It doesn't provide any HttpResponse cache on the presentation layer. WhirlyCache provides caching of domain objects at the Data Access layer. WhirlyCache can be plugged into Hibernate, a popular ORM framework. Configuration of WhirlyCache can be done in an XML file named 'whirlycache.xml' that comes with a set of default values. WhirlyCache provides programmatic API to access its cache. SwarmCache is an in-memory cache intended more for caching domain objects on the data access layer. It offers support for a distributed cache in a clustered environment. When an update/delete happens to a domain object in the cache, the cache manager of the affected cache will communicate to all the other managers in cluster to update their cache. As the number of updates to a cache in the cluster increases, the performance takes a hit. Every update to a cache will result in all the cache managers updating their local cache. SwarmCache has no support for HttpResponse caching. It also doesn't support caching of a page fragment in a dynamic JSP page. As mentioned above, SwarmCache supports caching of domain objects in this layer. There is no mention of SwarmCache getting plugged into any of the popular ORM tools. Hence, it is assumed that caching in the data access layer needs to be done specifically using SwarmCache's API. SwarmCache supports the LRU caching algorithm. However, SwarmCache is essentially an in-memory cache. When LRU is set as the caching algorithm and the memory capacity is reached, SwarmCache evicts the memory objects as per LRU logic from its memory. SwarmCache uses soft references to the cached objects. So, if the LRU is not set as the caching algorithm, it relies on the garbage collector to swipe through its memory and clean objects that are least frequently accessed. However, SwarmCache recommends a combination of the above two to be set as the caching algorithm. It provides API for clearing local cache. JBoss offers two kinds of cache flavors, namely 'TreeCache' and 'PojoCache'. Look at each of the two models in detail. JBoss Tree Cache is a cache to store POJO (Plain Old Java Objects). However, every object stored in the cache has to be a serialized java object; in other words, the object has to implement the java.io.serializable interface in a distributed tree cache. The cache is structured as tree—each node in the tree is a map. The tree cache can be local or distributed. If a change is done to a field in a POJO, the tree cache serializes the entire POJO. This can be expensive if the object size is a huge one. JBoss Tree cache offers caching in memory. If the memory reaches a limit, objects are passivated to a disk. Tree cache is JMX-enabled; it can provide statistics regarding the Cache to a MBeans server. It provides hooks for client code to attach listeners when a cache event occurs. Tree cache is transactional in nature. So, any updates/invalidations to an object in the cache are replicated to all the trees in the cache only after the transaction successfully commits. In case the transaction fails, no communication happens amongst the tree caches. JBoss Tree cache can be plugged into any of the popular Application Servers: IBM WebSphere, Bea Weblogic, Hibernate ORM tool, and so forth. It can be used in a standalone Java application that isn't run in the context of an Application server. At the outset, POJO cache differs from TreeCache in that the objects in the cache needn't implement the java.io.Serializable interface. POJO cache supports fine grained replications; only the changes made to the POJO are serialized. Also, any changes to the object are replicated across the cluster automatically; there needn't be any API call to do this action. POJO cache too supports object eviction to a disk. If the memory gets full, the objects are passed to a disk. POJO cache is transactional in nature and supports a distributed cache. As in Tree Cache, the POJO cache tool is more for storing domain objects on the data access layer. POJO cache can be used in the context of an Application Server and in a standalone java application too. As mentioned above, SwarmCache supports caching of domain objects in this layer. There is no mention of SwarmCache getting plugged into any of the popular ORM tools. Hence, it is assumed that caching in the data access layer needs to be done specifically using SwarmCache's API. SwarmCache supports the LRU caching algorithm. However, SwarmCache is essentially an in-memory cache. When LRU is set as the caching algorithm and the memory capacity is reached, SwarmCache evicts the memory objects as per LRU logic from its memory. SwarmCache uses soft references to the cached objects. So, if the LRU is not set as the caching algorithm, it relies on the garbage collector to swipe through its memory and clean objects that are least frequently accessed. However, SwarmCache recommends a combination of the above two to be set as the caching algorithm. It provides API for clearing local cache. JBoss offers two kinds of cache flavors, namely 'TreeCache' and 'PojoCache'. Look at each of the two models in detail. JBoss Tree Cache is a cache to store POJO (Plain Old Java Objects). However, every object stored in the cache has to be a serialized java object; in other words, the object has to implement the java.io.serializable interface in a distributed tree cache. The cache is structured as tree—each node in the tree is a map. The tree cache can be local or distributed. If a change is done to a field in a POJO, the tree cache serializes the entire POJO. This can be expensive if the object size is a huge one. JBoss Tree cache offers caching in memory. If the memory reaches a limit, objects are passivated to a disk. Tree cache is JMX-enabled; it can provide statistics regarding the Cache to a MBeans server. It provides hooks for client code to attach listeners when a cache event occurs. Tree cache is transactional in nature. So, any updates/invalidations to an object in the cache are replicated to all the trees in the cache only after the transaction successfully commits. In case the transaction fails, no communication happens amongst the tree caches. JBoss Tree cache can be plugged into any of the popular Application Servers: IBM WebSphere, Bea Weblogic, Hibernate ORM tool, and so forth. It can be used in a standalone Java application that isn't run in the context of an Application server. At the outset, POJO cache differs from TreeCache in that the objects in the cache needn't implement the java.io.Serializable interface. POJO cache supports fine grained replications; only the changes made to the POJO are serialized. Also, any changes to the object are replicated across the cluster automatically; there needn't be any API call to do this action. POJO cache too supports object eviction to a disk. If the memory gets full, the objects are passed to a disk. POJO cache is transactional in nature and supports a distributed cache. As in Tree Cache, the POJO cache tool is more for storing domain objects on the data access layer. POJO cache can be used in the context of an Application Server and in a standalone java application too. Terracotta works on hub-spoke architecture. Terracotta is used in a clustered environment. Each application server in the cluster acts as a client node (spoke). Terracotta libraries are installed in the JVM is loaded when each JVM in the cluster is brought up. Then, there is the terracotta server which acts as the hub. It can be backed up by another terracotta server. The terracotta server is implemented in Java. The terracotta server stores objects that are evicted out of each client nodes when the client nodes run low on memory. If the server runs low on memory, objects are passed to disk space on the server. Terracotta supports distributed cache. It also supports locks on objects in a distributed cache. Fine grained replications: Terracotta doesn't require the objects in its cache to be serialized and replicates only changes done to the cache across the cluster. Terracotta provides a view of the statistics of the memory heap of the client/server nodes in the cluster. Terracotta supports: Every application is different, but consider a cache product for the following application: Looking at the comparison sheet, the following two options are recommended: For a traditional web application that doesn't require HttpSession replication and has traditional serialized objects as data carriers, Jofti appears to be a good solution. However, for a very high-end application where importance is given to a sophisticated distributed cache with vendor support, JBossCache fits the bill Aleksey Shevchenko has been working with object-oriented languages for over seven years. For the past four years, he has served as a technical lead and a project manager. Aleksey has been implementing Enterprise IT Solutions for Wall Street and the manufacturing and publishing industries.
Caching Solutions in Java
September 20, 2007
Introduction
It should be easy to integrate the caching product with some of the popular ORM products such as Hibernate or Toplink. The domain objects are POJOS map to RDBMS entities and cached in memory, thereby reducing network traffic to the RDBMS.
The cache product should provide HTTP response/JSP caching on the presentation layer.
In case the memory capacity is full, the cache product should evict objects to a local disk.
A cache product should expose minimum API for the client to use.
A cache within each JVM needs to be coordinated in a clustered environment.
All the application threads within a JVM should be able to access the same instance of an object in a cache.
The caching product should provide a facility to invalidate a cache or a cache group. Cache invalidation should be coordinated in a distributed cache.
The cache maintains a local copy; some operations can continue even if the original source is unavailable.
In a distributed cache, multiple copies of a cache are available across processes. Thus, the scalability of the server is improved.
The cache product should have proper logging facilities in order to debug the code.
JCache is the standard; in other words, the JSR 107 caching service was born out of the JCP process. If the cache supports the standard, a client can have unified access to the cache.Available Open-Source Solutions
1. OSCache
1.1 Http Response caching
1.2 JSP Tag library caching
1.3 Data Access Layer caching
2. EHCache
2.1 HttpResponse and Page Fragment Caching
2.2 Data Access Layer Caching
3. Jofti
3.1 Querying Jofti's Indexing Mechanism
3.2 Configuration
4. ShiftOne
5. WhirlyCache
5.1 Presentation Layer Caching
5.2 Data Layer Caching
6. SwarmCache
6.1 Presentation layer
6.2 Data Access layer caching
7. Java Caching System (JCS)
8. Cache4j
9. JBossCache
9.1 TreeCache
Note: JBoss Tree cache is more for caching domain objects and doesn't support HttpResponse caching and Fragment caching in the case of a dynamic page like a JSP.
9.2 POJO cache
6.2 Data Access layer caching
7. Java Caching System (JCS)
8. Cache4j
9. JBossCache
9.1 TreeCache
Note: JBoss Tree cache is more for caching domain objects and doesn't support HttpResponse caching and Fragment caching in the case of a dynamic page like a JSP.
9.2 POJO cache
10. Open Terracotta
Comparisons of the various Cache Solutions
Conclusion
References
About the Author