http://www.developer.com/

Back to article

Selecting the Best Java Collection Class for Your Application


July 15, 2009

Java provides classes that implement the Collection interface to contain a collection of objects. There are over 20 of these collection classes, each having different performance and organizing properties.

This article explains how to choose the most appropriate of these classes for a particular use.

Collection Classes

The Java interface java.util.Collection is implemented by classes and can contain or manage a collection of objects. Java provides over twenty concrete classes that implement the Collection interface. These classes differ in performance and how they organize objects. Some are for specialized purposes. If you are not familiar with the capabilities of these classes, deciding which one is most appropriate for a particular purpose can be a challenge.

Table 1 shows a list of the concrete classes that Java 1.6 provides.

Table 1: Concrete Java 1.6 Classes That Implement Collection
Package Class
java.beans.beancontext BeanContextServicesSupport
BeanContextSupport
java.util ArrayDeque
ArrayList
EnumSet
HashSet
LinkedHashSet
LinkedList
PriorityQueue
Stack
TreeSet
Vector
java.util.concurrent ArrayBlockingQueue
ConcurrentLinkedQueue
ConcurrentSkipListSet
CopyOnWriteArrayList
CopyOnWriteArraySet
DelayQueue
LinkedBlockingDeque
LinkedBlockingQueue
PriorityBlockingQueue
SynchronousQueue
javax.management AttributeList
javax.management.relation RoleList
RoleUnresolvedList
javax.print.attribute.standard JobStateReasons

The classes in the java.util and java.util.concurrent packages are general purpose and have a wide range of uses. In contrast, the classes in the other packages are very specialized and used only for specific purposes with other classes in their individual packages.

  • The classes in the java.beans.beancontext package support bean contexts for JavaBeans development.
  • The classes in the javax.management and javax.management.relation packages are management beans that support remote administrative interfaces for programs.
  • The class in the javax.print.attribute.standard package is used by that package to help describe the current state of a print job.

That leaves 20 general-purpose classes to sort through. The most useful way to classify these is into two categories: how they organize objects, and how long they take to perform various operations. For most uses, how a Collection class organizes objects is more important than how quickly it can add, find, or remove them. Knowing that, let's focus on how these classes organize objects.

How Collections Organize Objects

Table 2 shows the objects' supported organizations and which classes support each organization. Notice that some of the Collection classes appear multiple times in the table. This is because some classes can support more than one organization.

Descriptions of the organizations follow Table 2.

Table 2: Collection Classes by Organization
Organization Class
List ArrayList
CopyOnWriteArrayList
LinkedList
Vector
Set HashSet
TreeSet
EnumSet
LinkedHashSet
ConcurrentSkipListSet
CopyOnWriteArraySet
Queue LinkedList
PriorityQueue
ArrayBlockingQueue
ConcurrentLinkedQueue
DelayQueue
LinkedBlockingQueue
PriorityBlockingQueue
SynchronousQueue
Deque ArrayDeque
LinkedList
LinkedBlockingDeque
Stack ArrayDeque
LinkedList
LinkedBlockingDeque
Stack

List Classes

Lists are collections whose contents have an explicit position within the collection. The position of each object in a List may be determined by the order in which objects are added. Collection classes that organize objects into lists implement the List interface.

If you have decided that a list is the organization that you need, here are suggestions regarding which class to use:

  • Vector: Vector is the oldest List class. It suffers from some inconsistencies with other List classes. All of its methods are synchronized, which can introduce unneeded overhead. Generally, you should not use the Vector class in new code; use ArrayList instead.
  • ArrayList: The ArrayList class is the most widely used List class. Most of the ArrayList class's operations are fast and the time they take is independent of how many objects are in the list. These fast operations include adding an object to and removing an object from the end of the list. Other fast operations are getting the size of the list, getting the object at a specified position in the list, and setting the object at a specified position in the list.
    Other operations take time proportional to the number of objects in the list.
  • LinkedList: The LinkedList class takes more time to perform most operations than the ArrayList class. The exception is that the LinkedList class can quickly add objects to the middle of a list or quickly remove objects from the middle of a list in constant time. The time that the ArrayList class takes to perform these operations is proportionate to the number of objects in the list after the position to which objects were being added or from which they were being removed. Unless you are going to perform a lot of these operations, the ArrayList class is usually the better choice.
    The LinkedList class is also useful for organizing queues, deques (double-ended queues), and stacks. In the rare situations where you need this flexibility, LinkedList may be a good choice.
  • CopyOnWriteArrayList: The CopyOnWriteArrayList class is used to solve a specific concurrency problem without having to resort to locks. To ensure that an iterator sees a consistent view of an ArrayList object when it is iterating over the object, the contents of the ArrayList object must not change. If they do, then the next time the iterator's next or previous method is called, the method will throw a ConcurrentModificationException exception.
    To avoid this situation, you could create a lock to prevent methods that would modify the ArrayList object from being called while an iterator is active. If you rarely iterate an ArrayList object while it is being modified, then using a lock is a good solution. However, if iterating over a ArrayList object is common, then solving the problem with a lock can create another problem: the amount of time threads spend waiting to modify the ArrayList object may be unacceptably high.
    The CopyOnWriteArrayList class is functionally similar to the ArrayList class, with one important difference. All operations that change the contents of a CopyOnWriteArrayList collection cause the underlying array to be replaced with a copy of itself before the contents of the array are changed. Any active iterators will continue to see the unmodified array, so there is no need for locks.
    Iterators created by a CopyOnWriteArrayList object cannot change the underlying array. Though these iterators do have methods for changing the underlying collection, their implementations throw an UnsupportedOperationException instead of modifying the underlying collection.

Set Classes

Sets are collections that contain no duplicates. Attempting to add an object to a set that already contains that object will have no effect. Collection classes that organize objects into sets implement the Set interface.

Classes that implement the Set interface are not required to keep their elements in any particular order. If all that you know about a collection object is that its class implements the Set interface, then you cannot predict the order in which an iterator on the collection will traverse the contents of the collection.

There is a sub-interface of Set named SortedSet. Classes that implement the SortedSet interface keep the elements of a set in a sorted order.

If you have decided that a set is the organization that you need, here are suggestions regarding which class to use:

  • HashSet: HashSet is the most commonly used class for managing a set of objects. Objects can be added to, removed from, and found in a HashSet quickly. The amount of time required is unrelated to the number of objects in the set, assuming that the parameters of the HashSet have been properly tuned.
    A HashSet object does not keep the objects in a set in any particular order. If you need a set that keeps its elements sorted in some order, then consider using the TreeSet class.
  • TreeSet: The TreeSet class keeps the elements of a set sorted in a specified order. However, the time that it takes to add, remove, or find elements in a set is generally longer than that required for a set managed by a HashSet. The time required for these operations is proportionate to the log of the number of elements in the set.
  • EnumSet: The EnumSet class provides a fast and compact way of representing a set of enum objects. All elements of an EnumSet set must come from the same enum type, and they are always sorted in their natural order.
  • LinkedHashSet: The LinkedHashSet class is similar to the HashSet class, except that it keeps its elements in the same order in which they were inserted into the set. Adding and removing elements in a LinkedHashSet object is a little bit slower than for a HashSet object due to the additional overhead of maintaining the order of its elements.
  • ConcurrentSkipListSet: Sets managed by the ConcurrentSkipListSet class can be safely modified by multiple threads at the same time, with no need to use locks. If a set managed by one of the other set classes may be updated by more than one thread, you will need locks to prevent multiple threads from updating the set at the same time. You can use the ConcurrentSkipListSet class to avoid threads having to wait for locks.
    A ConcurrentSkipListSet object keeps the elements of its set sorted in a specified order.
    Like TreeSet, the time that it takes to add, remove, or find elements in a ConcurrentSkipListSet is proportionate to the log of the number of elements in the set. However, the ConcurrentSkipListSet takes longer to do these things. One operation that takes a particularly long time is size, which takes time proportionate to the number of elements in the set.
  • CopyOnWriteArraySet: The CopyOnWriteArraySet class is used to solve a specific concurrency problem without having to resort to locks. While an iterator is iterating over a HashSet or TreeSet object, to ensure that the iterator sees a consistent view of the set, the contents of the set must not change. If they do, then next time the iterator's next or previous method is called, the method will throw a ConcurrentModificationException exception.
    To avoid this situation, you could create a lock to prevent methods that would modify the HashSet or TreeSet object while an iterator is active. If you rarely iterate over a set at the same time it is being modified, then using a lock is a good solution. However, if iterating over a set is common, then solving the problem with a lock can create another problem: the amount of time threads spend waiting to modify the set may be unacceptably high.
    The CopyOnWriteArraySet class has one feature that distinguishes it from the other classes. All operations that change the contents of a CopyOnWriteArraySet collection cause the underlying array to be replaced with a copy of itself before the contents of the array are changed. Any active iterators will continue to see the unmodified array, so there is no need for locks.
    Iterators created by a CopyOnWriteArraySet object cannot change the underlying array. Though these iterators do have methods for changing the underlying collection, their implementations throw an UnsupportedOperationException instead of modifying the underlying collection.

Queue Classes

Queues impose an order on the objects they contain. They have operations to add objects at the end of the order and remove objects from the start of the order. Most commonly, queues use the order in which objects are added as the ordering for their contents. Queues that use the order in which objects are added are sometimes referred to as FIFO (first in, first out) structures.

Classes that organize a collection as a queue implement the Queue interface. The methods defined in the Queue interface return immediately. This means that an attempt to remove an object from an empty queue fails either by returning null or throwing an exception, depending on which method is called. An attempt to add an object to a size-limited queue that is full fails either by returning false or throwing an exception.

Some classes that organize a collection as a queue implement a sub-interface of Queue named BlockingQueue. The BlockingQueue interface adds methods to Queue that may not return immediately if a queue is empty or full. The put method will wait to return or "block" until the queue is no longer full. The take method will block until the queue is no longer empty. There are also similar methods named offer and poll that will time out if they block for more than a specified amount of time.

If you have decided that a queue is the organization to use, here are suggestions regarding which class to use:

  • LinkedList: The LinkedList class can be used to organize objects into a queue that keeps objects in the order they are added. The add and remove methods take a short amount of time, which is independent of how many objects are in the queue. Queues based on the LinkedList class do not block.
  • PriorityQueue: A PriorityQueue object keeps objects in a queue sorted in a specified order. Its add and remove methods take time that is proportionate to the number of objects in the queue. Queues based on the PriorityQueue class do not block.
  • PriorityBlockingQueue: A PriorityBlockingQueue object keeps objects in a queue sorted in a specified order. Its add and remove methods take time that is proportionate to the number of objects in the queue. Queues based on the PriorityQueue include methods that block.
  • ArrayBlockingQueue: The ArrayBlockingQueue class can be used to organize objects into a queue that keeps objects in the order they are added. The add and remove methods take a short amount of time that is independent of how many objects are in the queue.
    Queues based on the ArrayBlockingQueue class can block. For situations when multiple threads are waiting to perform operations on a queue, the behavior of an ArrayBlockingQueue object will be more consistent and predictable than for a LinkedBlockingQueue.
    When multiple threads are waiting for blocked method calls of an object to return, there is normally no guarantee in which order the method calls will return. The ArrayBlockingQueue class's constructor takes an optional argument that can specify that the ArrayBlockingQueue object will force its blocked method calls to return in the same order they were called.
    A restriction of the ArrayBlockingQueue class is that the capacity of ArrayBlockingQueue objects must be specified as an argument to their constructor. The capacity of a ArrayBlockingQueue object cannot be changed after it is constructed.
  • LinkedBlockingQueue: The LinkedBlockingQueue class can be used to organize objects into a queue that keeps objects in the order they are added. The add and remove methods take a short amount of time that is independent of how many objects are in the queue.
    Queues based on the LinkedBlockingQueue class can block. When multiple threads are waiting to perform operations on a queue, the behavior of a ArrayBlockingQueue object will be more consistent and predictable than that of a LinkedBlockingQueue. However, the LinkedBlockingQueue class may provide better throughput.
  • ConcurrentLinkedQueue: The ConcurrentLinkedQueue class can be used to organize objects into a queue that keeps objects in the order they are added. The ConcurrentLinkedQueue class is a good implementation choice when many threads will seek to get objects into and out of a queue. This is because ConcurrentLinkedQueue can safely be modified by multiple threads at the same time. No locks are needed.
    The methods to add objects to or remove objects from a ConcurrentLinkedQueue take a constant amount of time that is not related to the number of objects in the queue. However, the size operation takes time proportionate to the number of objects in the queue.
  • DelayQueue: A DelayQueue is a specialized blocking queue that consults the objects it contains when they can be removed from the queue. Objects in a DelayQueue must implement the Delayed interface. The Delayed interface defines a getDelay method that returns a delay expressed in a specified unit of time. Objects cannot be removed from a DelayQueue while their delay is a positive number.
  • SynchronousQueue: The SynchronousQueue class is a specialized class for coordinating producers and consumers of objects. The SynchronousQueue class blocks. A SynchronousQueue object is always empty. A SynchronousQueue object's put method blocks unless another thread is waiting for the object's take method to return. A SynchronousQueue object's take method blocks unless another thread is waiting for the object's put method to return.

Deque Classes

A deque (pronounced "deck") is a double-ended queue for which elements can be added to or removed from either end.

Classes that organize a collection as a deque implement the Deque interface. The methods defined in the Deque interface return immediately. This means that an attempt to remove an object from an empty deque fails either by returning null or throwing an exception, depending on which method is called. An attempt to add an object to a size-limited deque that is full fails either by returning false or throwing an exception.

Some classes that organize a collection as a deque implement a sub-interface of Deque named BlockingDeque. The BlockingDeque interface adds methods to Deque that may not return immediately if a queue is empty or full. The putFirst and putLast methods will wait to return or block until the queue is no longer full. The takeFirst and takeLast methods will block until the queue is no longer empty. There are also similar methods that will time out if they block for more than a specified amount of time.

If you have decided that a deque is the organization to use, here are suggestions regarding which class to use:

  • ArrayDeque: The ArrayDeque class can be used to organize objects into a deque. The add and remove operations take a short amount of time that is independent of how many objects are in the deque. Deques based on the ArrayDeque class do not block.
    The ArrayDeque class is often the best choice for managing a deque. However, in some cases, a LinkedList will be faster. If performance is important, experiment to see if LinkedList performs better in your application.
  • LinkedList: The LinkedList class can be used to organize objects into a deque. The add and remove methods take a short amount of time that is independent of how many objects are in the queue. Deques based on the LinkedList class do not block.
  • LinkedBlockingDeque: The LinkedBlockingDeque class can be used to organize objects into a queue. The LinkedBlockingDeque class has methods that block.

Stack Classes

Stacks impose an order on the objects they contain so that the most recently added object will be the next object removed. Stacks are sometimes referred to as LIFO (last in, first out) structures.

Unlike the other organizations supported for collections, there is no interface common to classes the are used to organize stacks.

If you have decided that a stack is the organization to use, here are suggestions regarding which class to use:

  • Firstly, do not use the Stack class for new code. The Stack class is poorly designed. All of its methods are synchronized, which can introduce unneeded overhead.
  • Since Java provides no Stack interface, use one of the Deque classes to implement a stack. The operations needed to support a stack organization are a subset of the operations for a deque.

Performance of Collection Classes

If you are not interested in how objects will be organized in a collection, then the only other consideration is performance. In that case, use the ArrayList class. It is fast and makes efficient use of memory.

What Have You Learned?

This article has explained how to choose a collection based on object organization and performance constraints.

About the Author

Mark Grand is a software architect and book author with over 30 years of experience. Mark specializes in distributed systems, object oriented design, and Java, and he was the architect of the first commercial business-to-business e-commerce product to use the Internet.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date