The Google Collections Library
One of the things that first attracted me to Java many years ago was the inclusion of a standard collections library in the platform. At the time, in the C++ world, the STL (Standard Template Library) had yet to catch on, and developers were either left to find a collections library that they could buy and use (Rogue Wave became very popular), or more often write their own. I have lost count of how many times I implemented a linked list of something—different primitives or objects for different purposes. Then, there were the more complex collections, self-balancing binary trees, and hash tables. Although it might have been good for staying in touch with the software engineering basics, it was not so good for productivity.
Java changed all that. Even the 1.0 and 1.1 collection classes were a huge improvement, but the introduction of the Java Collections Framework with Java 1.2 was a quantum leap forwards in productivity. Since then, the standard collections have been regularly enhanced and improved, and with the addition of Generics in Java 5, the collections were updated to take advantage of those to give (at least compile-time) type checking. Doug Lea's concurrent collections (part of java.util.concurrent since Java 5) is a welcome addition as well, giving us collections like Queue and ConcurrentMap, which are ideal for use in concurrent systems.
Despite all of this, there are holes in the standard collections, items that are often re-implemented by developers, sometimes in a less-than-optimal way. There are pain points too. The cost/benefit analysis of Generics (or at least their implementation in Java) is an ongoing discussion, but whether you like them or not, they are very verbose; for example, looking back at collections in 2003 you would have seen something like:
Map mapOfLists = new HashMap();
Now, a map of things to a list of things might look more like this:
Map<String, List<String>> mapOfLists = new HashMap<String, List<String>>();
This is not exactly a fair comparison; the second definition carries a lot more compile time information, allowing the Java compiler to ensure that only Strings are used as keys, and Lists of Strings are used as values in the HashMap. However, you probably can notice that there is a lot of repeated information, the type signature is duplicated for the definition, and the initialization, and let's face it, it's not very pretty.
The Google Collections Library is a newly open-sourced library donated to the Java community by Google. It is intended to make some incremental improvements in usability for the existing Java collections, and add some new collections and features of its own. In this it is not alone; comparisons with the Apache Commons Collections are inevitable and the selection of a collection augmentation library is largely down to a matter of taste. This article will concentrate on the Google collections library, a library that I have used (albeit in an internal form) for over a year on a number of projects in Google very successfully. It feels like a natural extension to the Java collections framework, has been extremely reliable and performant, and frankly, the prospect of working without it on future projects is not appealing. Fortunately, because it is now available as an open source project, that should never be an issue.
Why Use the Google Collections Library?
Of course, the reasons I will give here are subjective. That said, I believe there are a number of good reasons to select the Google collections library to augment the Java collections framework:
- Readiness: Although the version is (currently) 0.5 alpha, this is more so that the APIs can change if necessary to improve the library as things are learned about the way it will be used externally to Google. To read into this version number and status that the libraries are excessively buggy or not ready for use yet would be incorrect; this same code has been tried and tested on many large Google projects for some time now and it is likely that most of the edge cases have been found. It also boasts 85% test coverage. Of course, this does mean that the API could change and although that may be a valid concern, it is likely that the changes will be minor and easily handled, but there is no guarantee. If this is a showstopper, keep an eye on the status to see when the API becomes more stable.
- Consistency: In use, both the new collections in the library, plus the new ease-of-use features, feel like a natural extension to the Java collections framework. This is not accidental—a great deal of work has gone into making the collections consistent with the behavior of the Java collections, and has been overseen by engineers who actually worked on the Java collections when they were implemented by Sun (for example, Josh Bloch). In particular, Generics are handled in a way identical to the Java collections framework.
- Size: The jar file for all of the new functionality is currently around 350k. This should not be a deal-breaker for most projects.
- Documentation: The javadocs are pretty thorough by any standards, and especially so for a library of this kind.
- Performance: These same collections are used in projects at Google where performance is a priority. Lots of work has gone into optimization.
- New functionality: The ease-of-use improvements and functional concepts included in the library are particularly attractive for systems that use collections extensively. For example, filtering results out of collections, or applying constraints.
To be fair, I should point out some potential concerns when using the library:
- The convenience creators are at odds with a type-inference proposal for Java 7, and could mean that in the future developers may have to convert some of their initialization code or choose to go with two standards.
- Furthermore, although some of these collections, or ideas from them, may make it into a JSR or two in the Java 7 or Java 8 timeframe, others may not. In other words, by using these collections now, you may have some more work to do to make them standards compliant in the future.
- As mentioned above, the API could still change.
How to Get It
The Google Collections Library can be downloaded from http://code.google.com/p/google-collections/ and at the time of writing this article, is at version 0.5. Indeed, given the lack of guarantees about the API stability at this time, it is possible that there will be changes in the API that might make the examples given here out of date, but it is expected that the differences will be small and hopefully obvious to fix.
To use the library in your own projects, simply include the jar file found when you unpack the downloaded archive. Javadocs and a src zip are also included, which most IDEs can tie to the jar file when you define the library. This will make using the library easier, and also help when debugging your projects.