Java Hashed Collections, Page 2
The hash table shown in Figure 1 uses an array of length 7. When an object is added with a hash code of 21991147, the logic for the has table computes the remainder of the hash code divided by 7 (21991147^7), which is 3. Based on this computation, element 3 of the array gets the object. A similar computation is made when the other objects shown in Figure 1 are added to the hash table.
When it is time to find an object in the hash table, the hash table uses the key object's hash code to compute the array index where it expects to find the object. If there is an object at that array index, the hash table uses the object's equals method to determine whether the object at the array index actually is the object that it is searching for.
When a hash table is asked to add an object, it computes an array index from the object's hash code. Sometimes, there is already a different object at the computed array index. A situation where a hash table must deal with different objects that it expects to find at the same array index is called a hash collision.
There are different variations on hash tables that differ in how they decide what to do when adding an object to a hash table creates a hash collision. For example, there is a variation called an open addressing that looks for a different place in the array to put an object when there is a hash collision.
The technique used in a HashMap to resolve a hash collision is called chained hashing. The idea of chained hashing is that more than one object can be associated with the same hash table array element. This is done by associating objects with an array element through a linked list, as shown in Figure 2.
Figure 2: Hash Collision
Figure 2 shows the hash table array connected to objects through linked list nodes. When an object to be added to the chained hash table has a hash collision with objects already in the hash table, it is added to the same linked list as the objects it collides with.
When looking for an object in a chained hash table, it is necessary to look at all of the objects on the linked list associated with the object's hash code. The longer a linked list gets, the longer it takes to search. The more hash collisions occur, the more time it takes to search for objects.
The fundamental strategy for making chained hash map operations fast is to avoid hash collisions. There are two things to do that will keep the number of hash collisions low. One is to make it very unlikely that two different objects in a hash table will have the same hash code. The other is to make the array in the hash table large enough that it becomes unlikely that objects with different hash codes will collide. I will explain how to do these things later in this article. First, you will look at how the HashMap class exposes the underlying hash table data structure.
External Visibility of the Hash Table in HashMap
Figure 3 shows how the HashMap class uses the chained hash table data structure.
Figure 3: HashMap's Use of a Chained Hash Table
The linked list nodes used by HashMap all refer to a key object. They also may refer to a value object and to a next node.
There are a few aspects of the HashMap class's constructors and methods that expose the underlying hash table data structure. The HashMap class has a constructor with the signature public HashMap(int initialCapacity).
The argument to this constructor is the length of the hash table array that the HashMap will use when it is constructed. In the next section of this article, you will learn how to select a value for this parameter for good performance.
The entrySet() method returns a set of Map.Entry objects. Map.Entry is an interface implemented by objects that a HashMap uses as linked list nodes for its hash table. The linked list nodes used by a HashMap contain a reference to a key object, a reference to a value object, and a reference to a next liked list node.