Java Hashed Collections
HashMap and HashSet
The HashMap and HashSet classes provide two different but similar ways to organize a set of objects using the same underlying data structure.
The HashSet class is used to manage a set of objects that have unique values. Passing an object to a HashSet object's add method will add the object to the HashSet unless there is already an object in the HashSet that equals the object passed to the add method. There is also a contains method to find an object in a set and a remove method to remove an object from a set.
The HashMap class has a put method that is passed a key and value pair. If the key is not already in the HashMap, the pair is added to the HashMap. If the key is already present in the HashMap, the given value replaces the previous value associated with the key. There is also a get method that returns the value associated with a given key and a remove method that removes the key-value pair for a given key.
A HashMap is like a HashSet with each object in the set having an associated value. The HashSet class is implemented as a wrapper around HashMap that uses the same object for both the key and value of each object pair.
The internal data structure used to implement a HashMap allows an object pair to be found very quickly by looking for the object pair's key. With exceptions explained later in this article, the amount of time a HashMap takes to find an object pair by its key object is about the same for all object pairs in a HashMap and independent of the number of object pairs in the HashMap.
One reason to use a HashMap or HashSet is because search, add, and remove operations are fast. A possible reason not to use a HashMap or HashSet is that they do not keep objects in any particular order. If you need objects kept in a particular order, you should use another collection class.
Although a HashMap object makes finding an object pair by its key object a fast operation, finding an object pair by its value is not so fast. A HashMap object's containsValue method returns true if it can find an object pair with a value that is equal to the object passed as a parameter to the containsValue method. To find an object pair with a specified value, the containsValue method looks at each of the object pairs in the HashMap until it finds one with the given value. The containsValue method must check every object pair in the HashMap before it can conclude that there is no object pair with the specified value.The internal data structure used in a HashMap that allows a HashMap to quickly find object pairs by their key is called a chained hash table. I will explain this data structure by explaining what a hash table is; then, I will explain the chained part. For the hash table to function properly, the objects in the hash table must be instances of classes whose equals and hashCode methods satisfy certain properties. So, before I explain what a hash table is, I will explain the equals and hashCode methods.
equals and hashCode Methods
The central idea of a hash table is that you can make an accurate guess, based on the value of a key object, where it belongs or where it will be found in the hash table. The guess is based on the value returned by the key object's hashCode method. The hash table uses the key object's equals method to verify the guess.
Every class inherits the equals and hashCode methods from the Object class. The implementations that the Object class provides for these methods are based on the assumption that two objects are equal if and only if they are the same object. The Object class's equals method returns true only if its argument is == to the object it is associated with. The Object class's hashCode method returns a number that the JVM uses to uniquely identify the object.
Instances of some classes are considered equal if they contain the same values. The java.lang.Integer and java.lang.String classes are examples of this. Classes of this sort should override the equals method with an implementation that returns true if two instances contain the same value. For example, the Integer classs equals method will return true when comparing two different Integer objects that contain the value 27.
If instances of such classes are to be kept in a hash table, the class also must override the hashCode method so that if two instances of a class are considered equal by its equals method, the class's hashCode method returns the same result for both. Returning the same result for instances that are equal is the only requirement for what int the hashCode method should return. Deciding the best way to implement hashCode for a class may require some thought.
In a few cases, there is an obvious value for a class's hashCode method to return. For example, the Integer class's hashCode method returns the same int that is the value of the Integer object. Implementing the hashCode method this way means that it requires no computations and that every possible value for an Integer object is its own unique hash code value.
For classes designed to contain complex values composed of multiple values, the value returned by the hashCode method must be the result of a computation that is based on all of the values. In the case of a Point class whose instances contain an x and a y value, the value returned by the hashCode method is the exclusive-or of the two values. In the case of String, the hashCode method returns a value computed from all of the characters in the string. If part of an object's value is in another object, the hash code computed for the object's should be partially based on the other object's hash code.
A hash table data structure uses an array to organize objects. When adding an object to a hash table, you divide the object's hash code by the length of the array and use the remainder as an array index to put the object in the array. Figure 1 shows this organization.
Figure 1: Hash Table