http://www.developer.com/
Understanding the 2D Discrete Cosine Transform in JavaSeptember 26, 2006 Java Programming Notes # 2446
PrefaceThis lesson is one in a series designed to teach you about the inner workings of data and image compression. The first lesson in the series was Understanding the Lempel-Ziv Data Compression Algorithm in Java. The previous lesson was Understanding the Discrete Cosine Transform in Java. The previous lesson dealt with one-dimensional Discrete Cosine Transforms. This lesson is the first part of a two-part lesson on two-dimensional Discrete Cosine Transforms (2D-DCT). JPEG image compression One of the objectives of this series is to teach you about the inner workings of JPEG image compression. According to Wikipedia,
Central components One of the central components of JPEG compression is entropy encoding. Huffman encoding, which was the primary topic of the earlier lesson entitled Understanding the Huffman Data Compression Algorithm in Java is a common form of entropy encoding. Another central component of JPEG compression is the two-dimensional Discrete Cosine Transform, which is the primary topic of this lesson. In this lesson, I will teach you how to use the forward 2D-DCT to compute and display the wave-number spectrum of an image. I will also teach you how to apply the inverse 2D-DCT to the spectral data to reconstruct a replica of the original image. A third central component of JPEG is selective spectral re-quantization. This will be the primary topic of a future lesson. In order to understand JPEG ... In order to understand JPEG image compression, you must understand Huffman encoding, the Discrete Cosine Transform, selective spectral re-quantization, and perhaps some other topics as well. I plan to teach you about the different components of JPEG in separate lessons, and then to provide a lesson that teaches you how they work together to produce "the most common format used for storing and transmitting photographs on the World Wide Web." Viewing tip You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them. Supplementary material I recommend that you also study the other lessons in my extensive collection of online Java tutorials. You will find those lessons published at Gamelan.com. However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com. In preparation for understanding the material in this lesson, I also recommend that you also study the lessons referred to in the References section. General Background InformationOne of the main reasons that we are studying 2D Discrete Cosine Transforms (2D-DCT) is to further our understanding of JPEG image compression. The 2D-DCT is a central component in JPEG. I will begin my discussion with an analogy to a portion of the JPEG image compression algorithm. Assume that you have been charged with the task of transporting all the water in a twenty-gallon tank from one building to a twenty-gallon tank in another building approximately three miles away. Unfortunately, you don't have any watertight containers in which to transport the water. All that is available for use in transporting the water is a sack made out of cloth. If you pour water into the sack, it simply runs through and out onto the floor. How can you accomplish your assigned task? Fortunately, the tank is mounted on small wheels that allow you to roll it around inside the building (but not outside the building). You search the building, and are happy to find a large walk-in freezer with a doorway large enough to accommodate the tank of water. Transform the water into ice So, you roll the tank into the freezer. When all of the water has frozen into ice, you use an ice pick that you found nearby and you chop the ice up into pieces. You fill your sack with ice and run as fast as you can to the other building where you empty the sack into the other tank. You keep making trips from one building to the next until you have transported all of the ice from the original tank to the tank in the other building. You let the ice melt in the tank in the other building, and you have accomplished your task. Is this a lossless process? Unfortunately, a small quantity of the ice melts during the trips to transport it from one building to the other, so you end up with a little less than twenty-gallons of water in the second tank. (This is not a lossless process.) Transformation is the key to success What you have done is to transform the state of the water from one form to another to make it possible to transport it in a leaky cloth sack. How does this apply to JPEG? JPEG image compression does something similar, but not for exactly the same reasons. When an image is compressed using JPEG, the form of the image that is stored, and possibly transported from one machine to another, is not the form that you are probably accustomed to seeing. Rather, the information that constitutes the image is transformed from the image or space domain into the frequency or wave-number domain. The information is compressed, stored, and transported in the frequency domain. Later on, the information is transformed back into the image domain for presentation to a human consumer.
The transformation to and from the frequency domain is accomplished using a 2D-DCT. Image information looks totally different As you will see later, the information that constitutes the image looks totally different when in the frequency domain than when it is in the image domain. (Once again, see Figure 9 for an example.) An almost lossless process If the image were simply transformed from the image domain into the frequency domain and back into the image domain, the process would be almost lossless.
Lossless compression is not the primary objective Generally speaking, however, the primary objective in JPEG image compression is usually not to implement a lossless process. Rather, the primary objective is to compress the image to make it smaller to store and to transport. A certain amount of distortion in the reconstructed image is usually considered tolerable.
The Lempel-Ziv algorithm LZ77 is the name commonly given to a lossless data compression algorithm published in papers by Abraham Lempel and Jacob Ziv in 1977. LZ77 is a lossless compression data algorithm. What is a lossless data compression algorithm? If you compress a document using the LZ77 algorithm, and then decompress the compressed version, the result will be an exact copy of the original document.
Dictionary and/or entropy encoding algorithms LZ77 is known as a dictionary encoding algorithm, as opposed for example to the Huffman encoding algorithm, which is a statistical or entropy encoding algorithm. Compression in the LZ77 algorithm is based on the notion that strings of characters (words, phrases, etc.) occur repeatedly in the message being compressed. Compression with the Huffman encoding algorithm is based on the probability of occurrence of individual characters in the message. Not very effective for image data Apparently neither of these compression algorithms is particularly effective when applied directly to the pixel color data that makes up an image. However, it has been determined that if an image is first transformed into the frequency domain, a form of frequency filtering can be applied to the frequency-domain data without having a serious adverse impact on the quality of the image when it is reconstructed. What kind of frequency filtering? The frequency filtering involves re-quantizing the frequency-domain data values at the higher wave-number frequencies. Re-quantization makes the frequency-domain data much more susceptible to the benefits of entropy encoding.
This is where the major compression benefits of JPEG image compression occur. This is also what prevents the reconstructed image from being a nearly exact match for the original image, thus causing JPEG to be a lossy image compression algorithm. A brief look at JPEG image compression That has been a brief look into the overall process of JPEG image compression so that you will know where the 2D-DCT fits into the process. I will get into the details in a future lesson. For now, the objective is to isolate and to understand the 2D-DCT portion of the process. An (almost lossless) example Before getting into a detailed discussion of the 2D-DCT, I want to show you an (almost) lossless example along with some backup information. The bottom image in Figure 1 shows the results of performing a forward 2D-DCT on the top image and then turning around and performing an inverse 2D-DCT on the spectrum that was produced by the forward transform.
Re-quantization was applied However, to partially simulate the behavior of the JPEG image compression algorithm, the spectral data was uniformly re-quantized so that it could be stored in an eleven-bit twos complement integer format (-1024 to +1023).
Otherwise, nothing was done to compress, or otherwise corrupt the spectral data prior to transforming that data back into the image domain. Therefore, the output image should be almost as good as the input image, with the only deterioration being the result of the re-quantization noise and arithmetic errors in the forward and inverse transforms. Quantization noise is visible It looks to me like the background areas, such as the large white areas in the image are grainier on the bottom image than the top image. This graininess did not exist in my original version which treated the spectral data strictly in double format. The graininess became evident at the point in time that I inserted the re-quantization of the spectral data to make it fit into eleven bits. More difficult than Discrete Fourier Transform Others may disagree, but in my opinion, it is more difficult to comprehend the Discrete Cosine Transform process than it is to comprehend the Discrete Fourier Transform process.
To paraphrase a popular comedian named Flip Wilson from my younger days, when you work with the DCT, "What you see is not what you get." DFT is linear with superposition With the DFT, which is a linear transform for which superposition applies, many of us with Digital Signal Processing (DSP) experience can look at a simple 2D space-domain function and have a pretty good idea what the wave-number spectrum for that function will look like. We do that by breaking the space-domain function down into familiar components, transforming the familiar components in our heads, and then reconstructing the individual transforms in our heads to produce an idea of the wave-number spectrum. More difficult with the DCT This is much more difficult to do with the 2D-DCT. The reason is that the DCT is not applied directly to the given space-domain function. Rather, the given space-domain function is implicitly expanded through mirror-imaging to produce a new space-domain function. Then, to a very close approximation, a Discrete Fourier Transform is applied to the new expanded space-domain function. Thus, there is an added degree of complexity involving the implicit modification of the original space-domain function. I will illustrate this by providing some comparisons between the Discrete Fourier Transform and the Discrete Cosine Transform. A simple diagonal line Consider first the case of a simple diagonal line in the space domain as shown in the left panel of Figure 2.
I explained this example in the earlier lesson, so I won't repeat that explanation here. Suffice it at this point to say that the left panel in Figure 2 shows a diagonal line in the space domain. The right panel shows the wave-number spectrum for that line produced by performing a 2D Discrete Fourier Transform (2D-DFT) on the image in the left panel. The important attribute Without getting into the details, the important thing to note is that the lines in the Fourier wave-number spectrum are perpendicular to the line in the space domain. A similar 2D-DCT example On the other hand, the top panel in Figure 3 shows an image with a similar diagonal line. The bottom panel in Figure 3 shows the result of performing a 2D-DCT on the image. (We might refer to this as the DCT spectrum.)
Something looks very wrong here! Something looks wrong when we compare the DCT spectrum in Figure 3 with the Fourier spectrum in Figure 2. The line in the DCT spectrum is not perpendicular to the line in the space domain. Rather, it is parallel to the line in the space domain.
And the problem is ... The problem is that there is no such thing as a single diagonal line in a DCT. True, we started with the single diagonal line shown in the top panel of Figure 3. However, the DCT process implicitly quadrupled the size of our sample. (The size was doubled in both dimensions.) For example, the DCT process created a mirror image of the original area across its upper boundary producing an area twice as tall as what we started with. Then it produced a mirror image of the entire double-high area across the left boundary. This resulted in a new area four times as large as the original area with the original line being reflected into the new areas in a mirror-image fashion. Then it effectively performed a DFT on the new surface containing the original line plus the mirror images of the original line. What does the expanded area look like? If you draw that out on paper, you will see that the image that was actually transformed using the DFT contained two lines, each twice as long as the original and arranged in a cross or X with the center of the cross at the original origin. Each of the legs on the cross was on a diagonal. A similar 2D-DFT Figure 4 shows the results of applying a 2D-DFT to two lines similarly arranged to form a cross.
As before, the image in the space domain is shown in the left panel of Figure 4, and the Fourier spectrum of that image is shown in the right panel. Taking into account that the bottom panel in Figure 3 corresponds to only the upper-left quadrant in the Fourier spectrum in Figure 4, the two look remarkably similar.
A diagonal line in both cases Both spectra show a diagonal line extending from the origin at a wave-number of zero down to the Nyquist folding frequency at the opposite corner. I believe that the reason for the spectral line in Figure 3 being parallel to the time-domain line instead of being perpendicular to that line is that the 2D-DCT shown in Figure 3 actually performed a 2D-DFT on two intersecting time-domain lines that form a cross. A vertical line The bottom image in Figure 5 shows the results of performing a 2D-DCT on the top image.
At first glance, it appears that there is no output in the bottom image. However, if you look very carefully, you will see a broken line at the top of the bottom image next to the area that separates the two images. Does this make sense? Once again, the top image has been implicitly expanded to a size that is four times as large as that shown. The expansion is accomplished by creating mirror images as described earlier. As a result, the input image no longer contains a single vertical line. Rather, it contains two parallel vertical lines and a DFT is performed on those parallel vertical lines. A similar DFT example The right panel in Figure 6 shows the results of performing an ordinary 2D-DFT on the pair of parallel vertical lines in the left panel.
Considering the Fourier spectrum in the upper-left quadrant of the right panel, we see that the output consists mainly of a broken line along the top of the spectrum, beginning at a wave-number of zero and extending to the right all the way to the sampling frequency (twice the distance shown in Figure 5). So, it does make sense to have the DCT spectrum shown in Figure 5 consist mostly of a broken line along the top edge of the DCT spectrum from a wave number of zero to the Nyquist folding frequency at the right edge. A horizontal line The bottom image in Figure 7 shows the results of performing a 2D-DCT on the top image containing the single horizontal line.
As before, because of the implicit expansion of the input image through the creation of mirror images, the mage that actually got transformed consisted of two parallel horizontal lines. Also as before, the output is difficult to see. If you look very closely, you will see a vertical broken line along the left edge of the bottom image. I could go through the same steps as before, performing a 2D-DCT on a pair of parallel horizontal lines to show that we should expect the output to be a broken vertical line along the left edge, beginning at a wave number of zero and extending downward to the Nyquist folding frequency at the bottom edge. So what? By now you are probably wondering why I am going to all of this trouble to convince you of what you should expect to see. For that, I am going to give you a preview of the next lesson. A preview In the next lesson, you will learn that rather than to perform the 2D-DCT on the entire image, the JPEG image compression algorithm sub-divides the image into 8x8 blocks and performs the 2D-DCT on each block independently. At this point, I won't go into what the algorithm does with the spectra produced through that process, but I will show you a preview of what the spectra can look like. Spectrum of a sub-divided image The bottom panel in Figure 8 shows the results of performing a separate 2D-DCT on several hundred contiguous 8x8-pixel blocks that make up the image shown in the top panel.
Each spectrum represents an 8x8-pixel image When performed in this fashion, each spectrum represents only one 8x8-pixel portion of the original image. In most cases, the upper-left corner of each spectrum, (representing a wave-number of zero), is identified by a bright dot. The spectral values for higher-frequency wave numbers appear to the right and down from that dot. The constant-color areas In those areas of the image where there are little or no color details, (such as in the white portion of the image or the portion behind President Lincoln's head), each spectrum consists of a single bright dot only. This is because the spectrum consists of a zero-frequency component only with a significant lack of energy at other frequencies. Areas with lots of color details In areas where there is a lot of detail, such as in President Lincoln's hair for example, the 8x8 spectral blocks tends to be gray, indicating lots of energy at the higher frequencies. Some particularly interesting areas There are several areas that I find particularly interesting. For example, on the left and right edges of the penny where the circular edge is tangent to a vertical line, you can see a tendency for the spectra to exhibit strong horizontal lines. This agrees with the results shown earlier in Figure 5. Vertical spectral lines At the top and the bottom of the penny where the circular edge is tangent to a horizontal line, there is a tendency for the spectra to exhibit strong vertical lines. This agrees with the results shown earlier in Figure 7. Diagonal spectral lines At the four edges of the penny where the slope is tangent to a line that is at an angle of forty-five degrees to the horizontal (northeast, southeast, southwest, and northwest) there is a tendency for the spectra to exhibit strong diagonal lines sloping down and to the right.
We also see this behavior at the lower-left edge of President Lincoln's beard near his ear, and at the edge of his collar where the lines in the image tend to be diagonal. In those areas also, there is a tendency for the spectra to exhibit strong diagonal lines sloping down and to the right. This agrees with the results shown earlier in Figure 3. I will discuss this in much more detail in the next lesson where I address the procedure of sub-dividing the image into 8x8-pixel blocks. Now back to the main topic of this lesson For the remainder of this lesson I will concentrate on applying the 2D-DCT to the entire image rather than applying the 2D-DCT to a sub-divided image. I will get back to the sub-divided image in Part 2 of this lesson, which I will publish in a few weeks. The bottom panel in Figure 9 shows the 2D-DCT spectrum computed for the entire image in the top panel of Figure 9.
Not much to look at Unlike the case in Figure 8, there is nothing in the spectrum in Figure 9 that would lead an ordinary human to suspect that this spectrum represents an image of a United States penny. Rest assured, however, that the spectrum shown in Figure 9 does contain all of the information necessary to reconstruct the original image. The bottom image in Figure 1 was produced by performing an inverse 2D-DCT on this spectral data. Two representations of the same information Thus, the top and bottom panels in Figure 9 contain the same information. They simply contain that information in different forms. The form in the top panel of Figure 9 is the one that we humans are most accustomed to seeing. We might think of that as the water in my earlier analogy. The form in the bottom panel of Figure 9 might be thought of as the ice in my earlier analogy. As it turns out, it is very easy to transform that information back and forth between the two forms. I will explain two programs in the remainder of this lesson that illustrate how to transform the data back and forth between the two forms. PreviewI am going to present and explain two different, but similar programs in this part of this lesson. (I will present a couple more programs in Part 2 later when I publish it.) The first program, named ImgMod34a performs a forward 2D-DCT on an image, and then displays the original image along with the DCT spectrum of that image (as shown in Figure 9). The second program named ImgMod34 also performs a forward 2D-DCT on an image. However, rather than stopping at that point and displaying the DCT spectrum, this program re-quantizes the spectral data to eleven bits, and then performs an inverse 2D-DCT on the re-quantized spectral data to produce and display a replica of the original image (as shown in Figure 1). Discussion and Sample CodeI will discuss this program in fragments. A complete listing of the program is shown in Listing 23. The purpose of this program is to compute and display the wave-number spectrum of an image using a 2D-DCT. A forward transform This program performs a forward 2D-DCT on each color plane belonging to an image producing a wave-number spectrum that describes each color plane in the image. Then it converts the wave-number spectrum to log base 10 to preserve the dynamic range of the display and normalizes the results to cover the range from 0 to 255. This makes the results suitable for being displayed as an image. Display the wave-number spectrum Then the program returns the wave-number spectrum for each color plane in an image format.
When displayed as an image, the visual result is the composite of the normalized wave number spectra of all three color planes being displayed. The capability to isolate specific colors The program provides the capability to enable statements that will effectively eliminate one, two, or all three of the color planes from the computation and the resulting display. (This requires modification of the source code and recompilation of the program.) Runs under control of ImgMod02a The program is designed to run under the control of the class named ImgMod02a. Enter the following at the command line to run this program: java ImgMod02a ImgMod34a ImageFileName where ImageFileName is the name of a .gif or .jpg file, including the extension. Class files required This program requires access to the following class files plus some inner classes that are defined inside the following classes:
The source code for ImgMod34a is presented in Listing 23. The source code for ForwardDCT01 was developed and explained in the previous lesson entitled Understanding the Discrete Cosine Transform in Java. The source code for ImgMod02a and ImgIntfc02 can be found in the earlier lessons entitled Processing Image Pixels Using Java: Controlling Contrast and Brightness and Processing Image Pixels using Java, Getting Started. Program testing The program was tested using J2SE 5.0 and WinXP. J2SE 5.0 or later is required due to the use of static imports. Source code for the ImgMod34a class The source code for the ImgMod34a class and the method named processImg begins in Listing 1.
Need to understand ImgMod02a In order to understand this program, you will need to understand how it interacts with the class named ImgMod02a, as described in the earlier lessons entitled Processing Image Pixels using Java, Getting Started and Processing Image Pixels Using Java: Controlling Contrast and Brightness. Once you understand that interaction, the code in Listing 1 should be straightforward. One color plane at a time The code in Listing 2 is provided to make it easy for you to experiment with only one color plane at time.
To experiment with only one color plane, enable two of the statements in Listing 2, causing the color values for those two planes to be set to zero. Following that, the program output will contain a contribution only from the color plane corresponding to the statement that you did not enable. Process the red plane Listing 3 extracts the red color plane from the 3D pixel image array, processes it, and inserts it back into the 3D pixel image array in the form of a wave-number spectrum.
The code in the methods named extractPlane and insertPlane is straightforward and shouldn't require an explanation. You can view the source code for those methods in Listing 23. The processPlane method The processPlane method, on the other hand, is the workhorse of this entire program and does deserve a thorough explanation. Therefore, I will set the processImg method aside for awhile and explain the processPlane method. Perform a forward 2D-DCT This method processes a color plane received as an incoming parameter. First it performs a forward 2D-DCT on the color plane producing the DCT spectrum for the plane. Convert to log base 10 and normalize Then it normalizes the spectral values in the plane to make them compatible with being displayed as an image. First it converts the spectral data to log base 10 in order to preserve the dynamic range of the display system. Then it causes the logarithmic spectral data to fall in the range 0 to 255 which is a requirement for being displayed as an image. A separable transform One of the significant attributes of the two-dimensional Discrete Cosine Transform (2D-DCT) is that it is separable. What this means in practice is that to compute the DCT for a single color plane of a 2D image, you can begin by performing a one-dimensional DCT on each row of the color plane, creating a new 2D structure where each row of the new structure contains the DCT of the corresponding row of the color plane. Then you can perform a one-dimensional DCT on each column of the new 2D structure creating a third 2D structure containing the 2D-DCT of the original image. An in-place transform Also important, at least from a memory utilization viewpoint, is the fact that you can perform the transforms "in-place" using the original color plane for intermediate and final data storage without a requirement to allocate memory for the new structures. Don't need a new DCT program What this means for me is that I don't need to develop a new DCT program to handle the 2D case. Rather, I can perform all the necessary DCT transforms that I need using the static one-dimensional forward DCT method named transform belonging to the class named ForwardDCT01. I developed and explained that class in the earlier lesson entitled Understanding the Discrete Cosine Transform in Java. Code for the processPlane method The processPlane method begins in Listing 4.
The code in Listing 4 determines the number of rows and the number of columns in the 2D plane to be processed. Listing 5 shows the beginning of a for loop that:
The code in the extractRow method is straightforward and shouldn't require a further explanation. You can view the extractRow method in its entirety in Listing 23. Perform the one-dimensional Discrete Cosine Transform on the row Listing 6 invokes the static transform method of the ForwardDCT01 class to perform the one-dimensional DCT on the row of image data.
Assuming that you have studied the earlier lesson entitled Understanding the Discrete Cosine Transform in Java, there is nothing in Listing 6 that should require a further explanation. The results of the transform are temporarily stored in the one-dimensional array referred to by theXform before being inserted back into the corresponding row of the color plane. Insert the spectral data into the color plane Listing 7 invokes the method named insertRow to insert the spectral data back into the corresponding row in the color plane. From this point forward, that row contains spectral data and not image color data.
You can view the method named insertRow in its entirety in Listing 23. Listing 7 also signals the end of the for loop, operating on rows that began in Listing 5. Transform the column data Listing 8 shows the complete for loop that:
The color plane now contains spectral data When the for loop in Listing 8 terminates, the 2D array referred to as colorPlane no longer contains image color data. Instead, it now contains the results of the 2D-DCT of the original image data for that color plane. In other words the data in the 2D array has now been transformed from image or space-domain data to frequency or wave-number spectral data. Problems with displaying the results The objective of this program is to display the 2D spectral data in the form of a standard image. This presents some problems that are not new to this program. In particular, the results of the 2D-DCT contain both positive and negative values. Furthermore, the magnitude of some of those values (particularly at or near a wave-number of zero) can be quite large. Standard image data, on the other hand must be unsigned data in the range from 0 through 255 inclusive. This almost always presents problems involving the methodology for converting the bipolar data to that value range. The methodology that I developed for this program isn't perfect, but it seems to work pretty well. Normalize the spectral data Listing 9 invokes the normalize method to prepare the 2D spectral data for being displayed as an image.
Listing 9 also signals the end of the processPlane method. The normalize method Before returning to the discussion of the processImg method (last seen in Listing 3) I will explain the normalize method. This method is fairly complex and implements the results of several decisions that I had to make regarding the best way to display the spectral data. This method normalizes the data in a 2D array containing data of type double to make it compatible with being displayed as an image plane. Eliminate negative values Normally, when viewing standard Fourier spectral data, unless we are specifically interested in phase angles, we usually compute and view the amplitude spectrum. The amplitude spectrum is computed by computing the square root of the sum of the squares of the real and imaginary parts of each complex spectral value (the length of the hypotenuse of a right triangle formed by the real and imaginary vectors). For the case of the DCT, there are no imaginary values. Therefore, the same results can be achieved simply by changing the sign on all negative spectral values making them positive instead of negative. The normalize method begins by converting all negative spectral values to positive values. Coping with a limited dynamic range for the display The dynamic range of the 2D-DCT spectral data is very large. The spectral values at and near the zero wave-number origin are much larger than values elsewhere in the spectrum. Seven shades of gray I learned at one point in my digital signal processing (DSP) career that a typical human can only distinguish between seven levels of gray with white and black representing two of those levels. Thus, a gray scale 2D display of the type shown in the bottom panel of Figure 9 has a very limited dynamic range. A log base 10 transform One common way to take better advantage of the available dynamic range of a display system when viewing spectral data is to convert the spectral data to decibels. This involves transforming the spectral values through a log base 10 transform and applying specific a specific scale factor to the resulting values. Although this program doesn't apply the specific scale factor required to qualify as a decibel scale, this program does apply a log base 10 transform to all of the spectral values. This results in a set of spectral values requiring less dynamic range in the display. Can produce negative values However, this transformation can result in negative values for very low spectral values; so once again, I am forced to deal with negative values. In this program the negative logarithmic values are simply set to zero. Making the data fit between 0 and 255 This still leaves us with the problem of how to squeeze the logarithmic data into the value range from 0 to 255 inclusive to satisfy the fundamental image pixel value requirement. One approach would be to scale all of the values such that the largest value becomes 255 and the smallest value becomes 0. I tried this, but even with the log transform, the values at and near the origin still overwhelmed the remaining portions of the spectrum, so this wasn't very satisfactory. Simply discard the lower-level values What I ended up doing was to simply discard all values below X-percent of the maximum by artificially setting all of those values to X-percent of the maximum. This created a floor at X-percent of the maximum. Then I shifted the floor down to a value of zero (black), and scaled the resulting data so that the maximum value ended up at 255 (bright red, bright green, or bright blue). What should the level of the floor be? Harking back to my "seven levels of gray" rule, I decided to put the floor at one-seventh of the maximum value. That is the value for the floor that produced the spectral display in the bottom panel of Figure 9. Raising the floor so as to make it closer to the maximum would cause the display in Figure 9 to become more sparse with more black area (a smaller percentage of the actual spectral data is actually being displayed). For example, if the floor is adjusted upward to fifty-percent of the maximum value, the spectrum for the image in Figure 9 shows only the bright spot at the origin plus a few light gray specs near the origin. If the floor is adjusted downward to one-percent of the maximum value, the spectrum for the image in Figure 9 shows quite a bit more light gray and quite a bit less black. Change the sign of the negative spectral values The code for the normalize method begins in Listing 10.
Listing 10 gets the size of the incoming 2D array, and then changes the sign of all negative values stored in the array. Convert to log base 10 Listing 11 converts all of the values to log base 10 to preserve the dynamic range of the plotting system. Negative log values are then set to zero.
Establish the floor Listing 12 sets all values below X-percent of the maximum value to X-percent of the maximum value where X is determined by the value of scale. Listing 12 also slides all values down to cause the floor to be 0.0.
Scale to accommodate the 0 to 255 rule Listing 13 scales the data so that the maximum value becomes 255.
Listing 13 also signals the end of the normalize method. Process green and blue color planes That brings us back to the processImg method, which we last saw in Listing 3. When the code in Listing 3 finishes execution, the red color plane has been transformed to spectral format and is ready to be displayed. Listing 14 applies exactly the same process to the green and blue color planes.
When the code in Listing 14 finishes execution, all three color planes have been transformed to spectral format, and are ready to be displayed. Return to the plotting program Listing 15 converts the results to a 3D array containing data of type int and returns that array to the calling method in the class named ImgMod02a, where it will be plotted in the format shown in Figure 9.
Listing 15 also signals the end of the processImg method and the end of the class named ImgMod34a. Now let's turn our attention from the program named ImgMod34a to the program named ImgMod34. The program named ImgMod34 This program performs a forward 2D-DCT on an image followed by an inverse 2D-DCT on the spectral data produced by the forward DCT. The result is to use the spectral data to reconstruct a replica of the original image as shown in Figure 1. Re-quantize to eleven bits To partially simulate the behavior of the JPEG image compression algorithm, all of the spectral results produced by the forward transform are re-quantized so that the data could be stored in an eleven-bit twos complement format (-1024 to +1023). Although the re-quantized data is never actually stored in an eleven-bit integer format, this process should create the same re-quantization noise that would be experienced if the data were actually stored in eleven bits. Otherwise, nothing is done to the spectral data Other than the re-quantization to eleven bits mentioned above, nothing is done to the spectral data following the forward DCT and before the inverse DCT. However, additional processing, such as selective high-frequency re-quantization and entropy compression, followed by decompression could be inserted at that point in the program for demonstration purposes. Also runs under control of ImgMod02a As with the earlier program, this program is designed to run under control of the class named ImgMod02a. Enter the following at the command line to run this program: java ImgMod02a ImgMod34 ImageFileName where ImageFileName is the name of a .gif or .jpg file, including the extension. Other class files required This program requires access to the following class files plus some inner classes that are defined inside the following classes:
The source code for the first class in the list is presented in Listing 24. The source code for the last two classes in the list was developed and explained in the previous lesson entitled Understanding the Discrete Cosine Transform in Java. The source code for the other two classes can be found in the earlier lessons entitled Processing Image Pixels using Java, Getting Started and Processing Image Pixels Using Java: Controlling Contrast and Brightness. Program testing This program was tested using J2SE 5.0 and WinXP. J2SE 5.0 or later is required due to the use of static imports. Code is very similar to earlier program Much of the code in this program is the same as, or very similar to the code in the program named ImgMod34a, which I explained earlier in this lesson. I won't repeat that explanation, but rather will simply refer you to Listing 24 where you can find a complete listing of ImgMod34. I will concentrate on the code that is different between the two programs. The beginning of the ImgMod34 class Listing 16 shows the beginning of the class definition for ImgMod34.
Note that because of the similarity of the code in this portion of the program to the code that I explained earlier, I deleted most of the code from Listing 16, leaving just enough code to keep us in synch with the execution of the program. At this point, I will set the discussion of the method named processImg aside and explain the method named processPlane, which is invoked near the bottom of Listing 16. The processPlane method The code for the processPlane method begins in Listing 17.
This method processes a color plane received as an incoming parameter.
Once again, for the reasons given earlier, I deleted most of the code from Listing 17 for brevity. See Listing 24 for the missing code. Re-quantization to eleven bits Although I'm not absolutely certain at this point as to the exact format that JPEG uses to represent numeric spectral values, I'm reasonably confident that it is not a Java double format, but instead is an integer format. One source on the web, entitled The JPEG Tutorial, implies that the spectral values can range from -1024 to +1023. This strongly suggests an eleven-bit twos complement integer format. To approximate this apparent characteristic of JPEG, Listing 18 invokes the method named requanToElevenBits to re-quantize the data such that it would fit into eleven bits (-1024 to +1023) as a twos complement integer type.
This is probably not exactly how it is done in JPEG, but hopefully it is a good approximation.
Not stored in an integer format In this program, even though the spectral data is re-quantized so that it will fit into an eleven-bit integer format, the data is never actually stored in an eleven-bit integer format. Rather, immediately after being re-quantized, each value is converted back to type double for storage in the array of type double. The method named requanToElevenBits You can view the method named requanToElevenBits in Listing 24. I am not going to explain that method in this lesson, but will explain re-quantization in a general sense in a future lesson.
Image is currently in eleven-bit spectral format At this point, the image has been transformed from the image or space domain into the frequency or wave-number domain. In addition, the spectral data has been re-quantized so that it could be converted to an eleven-bit integer format and stored in that format if there were a need to do so.
Restore the spectral magnitude Now I will convert the spectral data back into image data. First I will restore the magnitude of the spectral data that has been re-quantized to the range -1024 to +1023. This is necessary so that the relative magnitudes among the spectra for the three color planes will be correct.
Listing 19 invokes the method named restoreSpectralMagnitude in order to restore the magnitude of the spectral data.
The method named restoreSpectralMagnitude is straightforward and shouldn't require an explanation. The method can be viewed in its entirety in Listing 24. Perform the inverse 2D Discrete Cosine Transform Listing 20 uses the static method named transform belonging to the class named InverseDCT01 to perform the reverse of the operation explained earlier that began in Listing 5.
The transform method The static method named transform belonging to the class named InverseDCT01 was explained in the previous lesson in this series. Assuming that you have studied the earlier lesson entitled Understanding the Discrete Cosine Transform in Java, there is nothing in Listing 20 that should require a further explanation. Clip at 0 and 255 At this point, the spectral data has been converted back into image color data and we are faced with the familiar problem of guaranteeing that the values are compatible with representation as unsigned eight-bit values (0 to 255). In this case, I elected not to do anything fancy. Listing 21 invokes two methods that clip the values at 0 and 255 respectively, simply discarding any values that fall outside those bounds.
Both of the methods invoked in Listing 21 are straightforward. You can view them in their entirety in Listing 24. Return control to the processImg method Listing 21 also signals the end of the processPlane method, returning control to the code in the processImg method shown in Listing 22. Listing 22 shows the remaining code in the processImg method.
Listing 22 processes the green and blue color planes in a manner identical to the previous processing of the red color plane. Then Listing 22 converts the resulting image data to the required int format and returns the array reference to the calling method in the class named ImgMod029a. That's a wrap! So there you have it,
The results for one image are shown in Figure 1. As mentioned earlier, there appears to be some noise in the large empty areas of the image in Figure 1, (such as the empty white areas), which I believe is the result of the introduction of quantization noise into the spectral data when re-quantizing the spectral data to force it to fit in eleven bits. Run the ProgramI encourage you to copy and compile the code from Listing 23 and Listing 24. Experiment with the code, making changes, and observing the results of your changes. For example, try eliminating the code in ImgMod034 that re-quantizes the spectral data to see if it makes any difference in the quality of the resulting image. Try running ImgMod34a on a variety of different images to see if you can reach any conclusions regarding the appearance of the wave-number spectrum and the appearance of the image, as shown in Figure 3, Figure 5, Figure 7, and Figure 9 for example. SummaryIn this lesson, I taught you how to use the forward 2D-DCT to compute and to display the wave-number spectrum of an image. I also taught you how to apply the inverse 2D-DCT to the spectral data to reconstruct and display a replica of the original image. What's Next?The next publication in this series will be the second part of this two-part lesson on two-dimensional Discrete Cosine Transforms (2D-DCT). Future lessons in this series will explain the inner workings behind several data and image compression schemes, including the following:
ReferencesGeneral 2440 Understanding the Lempel-Ziv Data Compression Algorithm in Java Discrete Cosine Transform equations
Discrete cosine transform - Wikipedia, the free encyclopedia Complete Program Listings
Copyright 2006, Richard G. Baldwin. Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited. About the authorRichard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.Richard has participated in numerous consulting projects and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas. He is the author of Baldwin's Programming Tutorials, which have gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine. In addition to his programming expertise, Richard has many years of practical experience in Digital Signal Processing (DSP). His first job after he earned his Bachelor's degree was doing DSP in the Seismic Research Department of Texas Instruments. (TI is still a world leader in DSP.) In the following years, he applied his programming and DSP expertise to other interesting areas including sonar and underwater acoustics. Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems. |