Java Ordered Collections: Trees and Skip Lists
The Java collections framework includes classes you use to maintain collections of other objects. These collection classes implement interfaces that support different organizations of the objects they contain. For example, classes that implement the List interface keep objects in the order that they are added to the collection and can take a long time to search (proportionate to the number of objects in the collection). Classes that implement the Map interface keep objects in no particular order but are very fast to search (search time is independent of the number of objects in the collection). Classes that implement the SortedSet interface keep a set of objects in a guaranteed order, independent of the order they are added to the collection; this makes them fast to search.
Collection classes that implement the SortedSet interface impose a total ordering on the objects that they contain. There are two kinds of orderings that can be used with a SortedSet.
Instances of SortedSet classes can use the natural ordering of objects in the collection if the objects in the collection implement the Comparable interface. This means that the order of the objects is determined by the objects themselves.
- A SortedSet collection imposes a natural ordering on the objects it contains by calling the compareTo method that is part of the SortedSet interface. An object's compareTo method takes one argument that is the other object it compares the object to. The compareTo method returns a positive integer, 0, or a negative integer depending on whether the object is greater than, equal to, or less than the other object.
- A SortedSet object can use the natural ordering of objects it contains only if
- all of the objects implement the a SortedSet interface
- all of the objects can be passed to each other's compareTo method
- and the natural order is the order that is wanted.
- The alternative to using natural ordering is to use an object that implements the Comparator interface. The Comparator interface includes a compare method that takes two objects as its arguments. The compare method returns a positive integer, 0, or a negative integer depending on whether the first object is greater than, equal to, or less than the second object. A Comparator object can be used to impose any order on the contents of a SortedSet, so long as all possible comparisons produce consistent results.
The SortedSet interface does not specify what determines whether a SortedSet object uses the natural order of its contents or uses a Comparator object to impose an ordering. Implementations of SortedSet commonly use different constructors to determine this issue. Constructors that take a Comparator argument create a SortedSet object that uses the Comparator object. Constructors that do not take a Comparator argument create a SortedSet object that uses the natural ordering of its contents.
Java 1.6 comes with two classes that implement the SortedSet interface. These are java.util.TreeSet and java.util.concurrent.ConcurrentSkipListSet. Choosing to use one of these classes has implications for performance and concurrency. To understand these implications, you need to have some understanding of the data structures these classes use to organize a set of objects.
The TreeSet class uses a binary tree to organize a set of objects. Figure 1 shows an example of this data structure that contains Integer objects with the values 1 to 9.
Figure 1: Binary Tree
A TreeSet creates objects, called nodes, for its own internal use. A TreeSet object uses nodes to organize objects added to the TreeSet into a binary tree as shown in Figure 1. A TreeSet object creates a node for each object added to the TreeSet. Each node references the object it is responsible for organizing. A node also refers to zero, one, or two other nodes.
There are a few different roles that a node can play in the binary tree:
- If a node refers to other nodes, it is called the parent of the nodes it refers to.
- The other nodes a parent node refers to are called its children.
- Only one node in a binary tree has no parent. This node is called the root of the tree. In Figure 1, the root is drawn at the top of the diagram.
- Nodes in a binary tree that have no children are called leaves. In Figure 1, the leaf nodes are drawn towards the bottom of the diagram.
- Nodes that are not the root or a leaf node are called interior nodes.
Searching a binary tree is fast (it takes time proportionate to log(n), where n is the number of objects in the tree) because of two rules that the TreeSet enforces on the organization of the binary tree.
The first rule is about the relationship between a parent node and its child nodes. One of a parent node's references is either null or refers to a child node with a lesser value than the parent. The other node reference is either null or refers to a child node with a greater value that the parent. In Figure 1, the child on the left always has a lesser value than its parent; the child on the right always has a greater value than its parent.