April 6, 2020
Hot Topics:

Java Ordered Collections: Trees and Skip Lists

The binary tree shown in Figure 4 does not branch. To find the node for 9 requires searching very other node in the binary tree. A tree like this is not very fast to search. This brings me to the second rule that the TreeSet uses for its binary tree.

The second rule is that the tree must be balanced. Balanced means that the tree branches and the leaf nodes are about the same distance from the root of the tree. By distance, I mean the number of nodes between the root and the leaf. To keep its binary tree balanced, a TreeSet object may need to adjust the organization of its binary tree after each add or remove operation.

If the tree is perfectly balanced, the leaf node farthest from the root node is no more than one node farther from the root node than the nearest leaf node. The binary tree in Figure 1 is perfectly balanced.

On average, it takes less time to search for objects in a perfectly balanced binary tree than to search in a tree that is less well balanced. However, it turns out algorithms for keeping a tree perfectly balanced can require a lot of effort in some cases. Algorithms that keep a binary tree partially balanced within a guaranteed limit require a lot less effort.

The effort to keep a tree perfectly balanced may be justified if it is expected that many more search operations than adds or removes will be made. However, because TreeSet is a general purpose class, it uses an algorithm called red black tree. Red black tree guarantees that the distance from the root to the most distant leaf is no more than twice the distance from the root to the nearest leaf.

Earlier in this article, I mentioned that the choice to use a TreeSet has concurrency implications. None of its methods are synchronized. If two threads attempt to modify a TreeSet at the same time, it will break. To avoid this problem, it is up to the caller to arrange for calls to a TreeSet to be single threaded. If a TreeSet object's methods will have multiple callers, the solution is to have them call the TreeSet object's methods through a wrapper object whose methods are synchronized. This is commonly arranged code similar to this:

`SortedSet set s = Collections.synchronizedSortedSet(new TreeSet());`

The challenge this presents for concurrency is that access to the entire TreeSet object is forced to be single threaded. If there are more than a few threads trying to add objects to or remove object from the same TreeSet object, they will all need to wait for their turn. For situations like this, the better choice is ConcurrentSkipListSet.

ConcurrentSkipListSet

java.util.concurrent.ConcurrentSkipListSet is a thread-safe implementation of the SortedSet interface. Multiple threads can add and remove objects at the same time and work correctly. However, individual search, add, and remove operations take longer for a ConcurrentSkipListSet than for a TreeSet, so you usually should not use ConcurrentSkipListSet unless concurrency is an issue.

One other caution about the ConcurrentSkipListSet is that the size() method can take a long time. To avoid needing locks, the ConcurrentSkipListSet does not keep a running count of the number of objects that are currently in the set. For this reason, the size method works by iterating over the objects in the ConcurrentSkipListSet and counts them.

You now have the information you need to decide when it is appropriate to use a TreeSet or a ConcurrentSkipListSet. However, when using a ConcurrentSkipListSet, it may be helpful to understand the underlying data structure.

The ConcurrentSkipListSet is based on a data structure called a skip list. Unlike a binary tree, the internal organization of a skip list does not need to be adjusted because of the order that data is added or removed. The internal organization of a skip list is based on random numbers and is not dependent on the actual data in the skip list.

Figure 5 shows an example of a skip list.

Figure 5: Skip List

The first node in the skip list is called the header. It refers to other nodes or an object called NIL. NIL is at the end of every skip list. Both the header and NIL are in every skip list, even if the skip list is empty.

The header is the first node in multiple linked lists. These linked lists are called level 1, level 2, … In Figure 5, the level 1 list is shown towards the bottom of the diagram and the level 4 list is shown towards the top of the list.

Nodes that appear in the linked lists between the header and NIL have associated value objects. The order in which nodes appear in the linked lists is based on their value objects and determined by the natural ordering of the value objects or by a Comparator.

All of the nodes in a skip list are in the level 1 linked list. Roughly ½ of the nodes in a skip list are in the level 1 and 2 linked lists. About ½ of those nodes (about ¼ of all the nodes) are also in the level 3 linked list. If there are higher levels of linked lists, each linked list includes about ½ of the nodes in the linked list in the next lower level.

The algorithm for searching a skip list starts at the highest level linked list, looking for a node whose value is greater than or equal to the value being searched for. If the value of a node is equal to what is being searched for, the search is done. Otherwise, the search backs up a node and continues one level of linked list down until either the search is successful or there are no more nodes to search. To clarify the search process, here is an example:

Suppose that you are searching the skip list shown in Figure 6 for an object with the value 5. There are four levels in the skip list, so you start with the level 4 linked list. The first node on the level 4 linked list (after the header) has the value 2. Because 2 is less than 5, you look at the next node on the level 4 linked list.

The next node on the level 4 linked list is NIL. Because NIL is the end of the linked list, the search continues on the level 3 linked list.

The next node on the level 3 linked list after the node with the value 2 has the value 9. Because 9 is greater than 5, the search continues on the level 2 linked list.

The next node on the level 2 linked list after the node with the value 2 has the value 4. Because 4 is less than 5, the search continues with the next node on the level 2 linked list.

The next node on the level 2 linked list after the node with the value 4 has the value 6. Because 6 is greater than 5, the search continues on the level 1 linked list.

The next node on the level 1 linked list after the node with the value 4 has the value 5. Because that is equal to the value we are looking for, the search is done.

Page 3 of 4