Java Hashed Collections
The Map.Entry interface provides only limited access to the contents of a linked list node. It has methods to get the key object, to get the value object, and to set the value object for the linked list node. The Map.Entry interface does not include any methods to access other linked list nodes or objects.
Quality of hashCode()
There are two ways that hash collisions can occur. One way is that the hashCode method returns the same value for objects that have different values. If two objects with distinct values but the same hash code are added to the same HashMap, they collide.
Ideally, a class's hashCode method should return a different value for each distinct value that the class's instances can have. The hashCode method of some classes are implemented in this way. However, there are some classes for which this is impossible.
The hashCode method returns an int. Classes that contain a long value or complex values can have more distinct values than can be represented by an int. For example, String objects can represent many more values than an int. For such classes, the hashCode method should be implemented in a way that makes it unlikely that two objects in the same HashMap will have the same hash code.
The way to make it unlikely that two instances of the same class with distinct values will have the same hash code varies with the nature of the value. However, there are some general principles:
- If the instances of a class contain a complex value, the result of its hashCode method should be based on all the parts of the complex value.
- Similar values should have different hash code values. For example, the String class's hashCode method considers the characters in the string and their sequence so that "tis" and "its" have different hash codes.
Capacity and Loading Factor
If the length of the hash table array is less than the number of objects in the HashMap, there must be hash collisions because there are not enough linked lists for each key-value pair to be by itself. The larger the hash table array, the less likely that objects with different hash codes will collide. There are two parameters to control this. These parameters are initial capacity and loading factor.
Initial capacity is the initial length of the hash table array. As mentioned previously, there is a constructor that allows you to specify this parameter.
Loading factor is a ratio of the number of objects in the HashMap to the length of the hash table array. If the number of objects in a HashMap exceeds this ratio, the HashMap object makes the hash table larger. For example, if the initial capacity of a HashMap object is 16 and its loading factor is 0.75, if the number of keys in the HashMap becomes larger than 12 (16×.75) the hash table array will be made larger.
When a HashMap object enlarges its hash table array, it creates a new hash table array that is approximately double the length of the old array. It then rehashes all of the objects in the HashMap to put them in the appropriate linked list for the larger array.
Because it involves all the objects in a HashMap, rehashing is an expensive operation. You can postpone or avoid rehashing by setting the initial capacity sufficiently high.
The default value for loading factor is 0.75. You can set it to a different value by using this constructor:
HashMap(int initialCapacity, float loadFactor)
The default value for initial capacity is 16. You can set it to a different value by using the constructor shown above or the constructor
If you use an iterator over a HashMap, excessively large values for initial capacity or small values for loading factor can make iterators slow. An iterator looks at everything in the entire hash table. If the hash table array is much longer than needed, the iterator will spend time looking at many empty linked lists.
Hash tables do not keep their contents in any particular order. Due the possibility of rehashing, the order that objects are kept in a hash table may change without warning.
Here are a few classes that are good alternatives to HashMap or HashSet in certain specialized circumstances.
- ConcurrentHashMap: The HashMap class is not thread safe. If two different threads want to modify a HashMap at the same time, it will be necessary to use synchronization locks to ensure that only one thread at a time accesses the HashMap. When multiple threads want to access a HashMap at the same time, they will be forced to wait for their turn. This can be a performance problem if there is a lot of concurrent access to the HashMap.
The ConcurrenthashMap class is a good alternative to HashMap in these cases. The ConcurrenthashMap class is thread safe. Multiple threads can modify a ConcurrenthashMap object at the same time without any external synchronization. The ConcurrenthashMap class does use some internal synchronization locks, but these only lock access to individual linked list nodes. Threads do not need to wait for each other unless multiples threads are concurrently trying to change the value associated with the same key or there are hash collisions.
The ConcurrenthashMap class is not a good alternative to HashMap in other circumstances, beacause it takes longer to perform the same operations when no concurrency in involved.
- IdentityHashMap: The IdentityHashMap class is similar to HashMap, but it compares objects using == and uses the System.identityHashCode method for each object's hash code. This is useful only for the unusual situation where you are concerned only with how the object is identified by the JVM and not with the object's contents.
- LinkedHashMap: The LinkedHashMap class combines a HashMap with a doubly linked list so that the keys in the HashMap can be kept in a particular order.
- WeakHashMap: This class differs from HashMap in that if there are no other references to key objects in a WeakHashMap, the key objects will be removed from the WeakHashMap by the garbage collector.
- WeakHashSet: This class differs from HashSet in that if there are no other references to objects in a WeakHashSet, the objects will be removed from the WeakHashSet by the garbage collector.
This article has explained how to use the HashSet and HashMap classes. It has also explained the internal organization of these classes and how to use them most efficiently.
About the Author
|Mark Grand is a consultant and book author with over 30 years of experience who specializes in Distributed Systems, Object-Oriented Design, and Java. He was the architect of the first commercial business-to-business e-commerce product to use the Internet.
Mark Grand is most widely known for his best selling design pattern books. Mark has taught for U.C. Berkeley, Sun, and other organizations. He is based in the Atlanta area and has been involved with object-oriented programming and design since 1982.
Page 3 of 3