September 30, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Understanding the Huffman Data Compression Algorithm in Java

  • May 2, 2006
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Decode the Message

Still in the main method, I will continue the demonstration by decoding the encoded message.  I will accomplish this by instantiating an object of the HuffmanDecoder class and invoking the decode method on that object as shown in Listing 35.

    HuffmanDecoder decoder = new HuffmanDecoder();
    

    String decodedData = decoder.decode(binaryEncodedData,
                                        huffEncodeTable,
                                        rawDataLen);

Listing 35

(As you will soon see, it is somewhat easier to decode a Huffman-encoded message than it is to encode it.)

Information required to decode the message

Listing 35 passes the encoded message to the decode method of the HuffmanDecoder object along with a reference to the  data structure containing the encoding particulars as well as the length of the original message.

(The length of the original message is used to eliminate extraneous characters on the end of the decoded message.)

The HuffmanDecoder class

Putting the main method aside for awhile, the beginning of the class definition of the HuffmanDecoder class begins in Listing 36.

class HuffmanDecoder{
  Hashtable <String,Character>huffDecodeTable = 
                         new Hashtable<String,Character>();
  String stringDecodedData;
  String decodedData = "";
  Hashtable <Byte,String>decodingBitMap = 
                              new Hashtable<Byte,String>();
  ArrayList <Byte>binaryEncodedData;
  
  //The following structure contains particulars as to how
  // the original message was encoded, and must be received
  // as an incoming parameter to the decode method along
  // with the encoded message and the length of the
  // original message.
  Hashtable <Character,String>huffEncodeTable;
  //Used to eliminate the extraneous characters on the end.
  int rawDataLen;
Listing 36

An object of the HuffmanDecoder class can be used to decode a Huffman-encoded message given the encoded message, a data structure containing particulars as to how the original message was encoded, and the length of the original message.

The code in Listing 36 gets things started by declaring several instance variables and initializing some of them.  I will discuss the instance variables in conjunction with the code that uses them.

The decode method

Listing 37 shows the beginning of the decode method of the HuffmanDecoder class.

  String decode(ArrayList <Byte>binaryEncodedData,
               Hashtable <Character,String>huffEncodeTable,
               int rawDataLen){
    //Save the incoming parameters.
    this.binaryEncodedData = binaryEncodedData;
    this.huffEncodeTable = huffEncodeTable;
    this.rawDataLen = rawDataLen;

Listing 37

The decode method receives a Huffman-encoded message along with a data structure containing particulars as to how the original message was encoded and the length of the original message.  It decodes the original message and returns the decoded version as a String object.

Listing 37 saves the incoming parameters in instance variables that were declared in Listing 36.

Create a decoding bit map

Listing 38 invokes the buildDecodingBitMap method, which is essentially the reverse of the encoding bit map that was used to encode the original message.

    buildDecodingBitMap();

Listing 38

The buildDecodingBitMap method

The buildDecodingBitMap method can be viewed in its entirety in Listing 45.  This method populates a lookup table that relates eight bits represented as a String to eight actual bits for all possible combinations of eight bits.

This is essentially a reverse lookup table relative to the encodingBitMap table that is used to encode the message.  The only difference between the two is a reversal of the key and the value in the Hashtable that contains the table.

Decode from binary to String

Listing 39 invokes the decodeToBitsAsString method to decode the encoded message from a binary representation to a String of 1 and 0 characters that represent the actual bits in the encoded message.

    decodeToBitsAsString();

Listing 39

The decodeToBitAsString method

This method, which can be viewed in its entirety in Listing 45 is very straightforward.  Therefore no further discussion is warranted.

Build a decoding table

Listing 40 invokes the buildHuffDecodingTable method to create a Huffman decoding lookup table by swapping the keys and values from the Huffman encoding table received as an incoming parameter by the decode method.

    buildHuffDecodingTable();

Listing 40

The buildHuffDecodingTable method

You guessed it!  Once again, this method, which can be viewed in Listing 45, is too simple to deserve further discussion.

Decode the message

Listing 41 invokes the decodeStringBitsToCharacters method to decode the String containing only 1 and 0 characters that represent the bits in the encoded message.  This produces a replica of the original message that was subjected to Huffman encoding.

    decodeStringBitsToCharacters();

Listing 41

The decodeStringBitsToCharacters method

The decodeStringBitsToCharacters method is shown in its entirety in Listing 42.

  void decodeStringBitsToCharacters(){
    StringBuffer output = new StringBuffer();
    StringBuffer workBuf = new StringBuffer();

    for(int cnt = 0;cnt < stringDecodedData.length();
                                                    cnt++){
      workBuf.append(stringDecodedData.charAt(cnt));
      if(huffDecodeTable.containsKey(workBuf.toString())){
        output.append(
                  huffDecodeTable.get(workBuf.toString()));
        workBuf = new StringBuffer();
      }//end if
    }//end for loop
    
    decodedData = output.toString();
  }//End decodeStringBitsToCharacters();

Listing 42

Two empty StringBuffer objects

The decodeStringBitsToCharacters method begins with one empty StringBuffer object referred to by the variable named workBuf and another empty StringBuffer object referred to by the variable named output.

The StringBuffer object referred to by output is used to construct the decoded message.  The StringBuffer object referred to by workBuf is used as a temporary holding area for substring data.

Build a working string one character at a time

The method reads the String containing only 1 and 0 characters that represent the bits in the encoded message (stringDecodedData).  The characters are read from this string one character at a time.  As each new character is read, it is appended to the StringBuffer object referred to by workBuf.

Compare the working string to the decoding keys

As each new character is appended to the StringBuffer object, a test is performed to determine if the current contents of the StringBuffer object match one of the keys in a Hashtable table that relates strings representing Huffman bit codes to characters in the original message.

When a match is found...

When a match is found, the value associated with that key is retrieved from the Hashtable and appended onto the StringBuffer object referred to by output.  Thus, the output text is built up one character at a time.

Clear the working string

Having processed the matching key, a new empty StringBuffer object is instantiated and is referred to by workBuf.  The process of reading, appending, and testing for a match is repeated until all of the characters in the string that represents the bits in the encoded message have been processed.

The original message has been reconstructed, with extra characters

At that point, the StringBuffer object referred to by output contains the entire decoded message.  (It may contain extraneous characters at the end.)  It is converted to type String and referred to by decodedData.  Then the decodeStringBitsToCharacters method returns control to the decode method with the task of decoding the encoded message having been completed.

Back in the decode method...

As shown in Listing 43, the decode method returns control to the main method returning a reference to a String containing a replica of the original message in the process.

    return decodedData.substring(0,rawDataLen);    
  }//end decode method

Listing 43

Note that the code in Listing 43 uses the known length of the original message to trim extraneous characters from the end of the decoded message.

Listing 43 signals the end of the decode method and the end of the HuffmanDecoder class.

Back in the main method...

Listing 44 invokes the display48 method to display the decoded results, 48 characters to the line as shown near the bottom of Figure 1.

    System.out.println("nDecoded Data");
    display48(decodedData);

  }//end main

Listing 44

And that's a wrap!  I hope that you have enjoyed this lesson on Huffman encoding.

Run the Program

I encourage you to copy the code from Listing 45 into your text editor, compile it, and execute it.  Experiment with it, making changes, and observing the results of your changes.

Create a variety of test messages of your own and determine the compression factor for different types of messages.  For example apply the Huffman compression algorithm to raw HTML messages to see if they compress more than straight text messages.  Create a simulation of an encrypted message using byte values from a random number generator as the message content and see how well that message compresses.

Summary

In this lesson, I have taught you about the inner workings of the Huffman lossless compression algorithm.  I also showed you the results obtained by applying the algorithm to several different test messages.

What's Next?

Future lessons in this series will explain the inner workings behind several other data and image compression schemes, including the following:

  • Run-length data encoding
  • GIF image compression
  • JPEG image compression

References

2440 Understanding the Lempel-Ziv Data Compression Algorithm in Java
076 Vectors, Hashtables, and Enumerations
508 JavaBeans, Properties of Beans, Simple and Indexed
510 JavaBeans, Properties of Beans, Bound Properties 
512 JavaBeans, Properties of Beans, Constrained Properties
1350 Data Structures in Java: Part 1, Getting Started
1352 Data Structures in Java: Part 2, What Is a Collection?
1354 Data Structures in Java: Part 3, Purpose of Framework Interfaces
1356 Data Structures in Java: Part 4, Purpose of Implementations and Algorithms
1358 Data Structures in Java: Part 5, The Core Collection Interfaces
1360 Data Structures in Java: Part 6, Data Structures in Java: Part 6, Duplicate Elements, Ordered Collections, Sorted Collections, and Interface Specialization
1362 Data Structures in Java: Part 7, The Comparable Interface, Part 1
1364 Data Structures in Java: Part 8, The Comparable Interface, Part 2
1366 Data Structures in Java: Part 9, The Comparator Interface, Part 1
1368 Data Structures in Java: Part 10, The Comparator Interface, Part 2
1370 Data Structures in Java: Part 11, The Comparator Interface, Part 3
1372 Data Structures in Java: Part 12, The Comparator Interface, Part 4
1374 Data Structures in Java: Part 13, The Comparator Interface, Part 5
1376 Data Structures in Java: Part 14, The Comparator Interface, Part 6
1378 Data Structures in Java: Part 15, The toArray Method, Part 1
1380 Data Structures in Java: Part 16, The toArray Method, Part 2
2300 Generics in J2SE, Getting Started





Page 2 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel