October 19, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Understanding Base64 Data

  • July 27, 2004
  • By Richard G. Baldwin
  • Send Email »
  • More Articles »

Java Programming Notes # 2188


Preface

This is the third in a series of lessons designed to teach you how to write Java programs to protect your email inbox from spam and email-borne viruses.  The first lesson in the series was entitled Overview of the BigDog Email Protection Program.  The previous lesson was entitled Getting Started with the BigDog Email Protection Program.

In addition, the material in this lesson has broad applicability in other areas such as Security, Introduction to Message Digests and Servlets, Session Tracking Using Basic Authentication.

I have published several earlier lessons that deal exclusively with spam and email-borne viruses, such as the series that began with the lesson entitled Enlisting Java in the War Against SPAM: The Communications Module and the series that began with the lesson entitled Enlisting Java in the War Against Email Viruses.  Information in those lessons serves as background material for this series.

Viewing tip

You may find it useful to open another copy of this lesson in a separate browser window.  That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.

Supplementary material

I recommend that you also study the other lessons in my extensive collection of online Java tutorials.  You will find those lessons published at Gamelan.com.  However, as of the date of this writing, Gamelan doesn't maintain a consolidated index of my Java tutorial lessons, and sometimes they are difficult to locate there.  You will find a consolidated index at www.DickBaldwin.com.

Preview

This lesson explains the use of base64 encoding and decoding in general, and illustrates base64 encoding and decoding using sample programs.

A future lesson will explain how base64 decoding is used in the BigDog program.

Understanding Base64

What is base64 encoding?

As I understand it, the base64 encoding scheme was originally devised to make it possible to reliably transmit eight-bit data through transmission systems constrained to handle seven-bit data.  The encoding scheme has been in use for many years.

Among other things, the use of base64 encoding makes it possible to:
  • Transmit image data reliably across the Internet.
  • Transmit non-English characters reliably across the Internet.
Can be used to hide spam

Unfortunately, the use of base64 encoding also makes it possible for spammers to hide offensive text from spam blocking programs that are not equipped to deal effectively with the hiding technique.  The spam screening module used by the BigDog set of programs deals with the following:
  • Encoded subject lines.
  • Encoded body text in single-part messages.
  • Encoded body text in multipart messages.
This lesson explains base64 in general.  Future lessons will explain the Java code that I have written to deal with these issues in the BigDog program.

RFC 1521

One of the best resources that I have found for understanding base64 encoding is the document entitled Mechanisms for Specifying and Describing the Format of Internet Message Bodies, otherwise known as Request for Comments (RFC) 1521.
(This is a rather large document that covers numerous topics in addition to base64 encoding.)
The author of RFC 1521 states:
"STD 11, RFC 822 defines a message representation protocol which specifies considerable detail about message headers, but which leaves the message content, or message body, as flat ASCII text. This document redefines the format of message bodies to allow multi-part textual and non-textual message bodies to be represented and exchanged without loss of information."
Support for richer text, audio, video, and non-English languages

In justifying RFC 1521, the author also states:
"Even in the case of text, however, RFC 822 is inadequate for the needs of mail users whose languages require the use of character sets richer than US ASCII [US-ASCII]. Since RFC 822 does not specify mechanisms for mail containing audio, video, Asian language text, or even text in most European languages, additional specifications are needed."
After discussing several problems that existed prior to RFC 1521, the author states:
"This document describes several mechanisms that combine to solve most of these problems without introducing any serious incompatibilities with the existing world of RFC 822 mail."
The author of RFC 1521 goes on to describe several features proposed by RFC 1521, including the use of base64 encoding in email messages.  The author tells us that base64 as described in RFC 1521 is "...adapted from RFC 1421, ..."  RFC 1421 describes Message Encryption and Authentication Procedures.
(I have written several previous lessons involving encryption and authentication that briefly describe the use of base64 encoding.)
The base64 encoding process

Eight-bit data values are mapped into
a 65-character subset of the US-ASCII code, enabling subgroups of 6 bits each to be represented by 64 different printable characters.
(The extra 65th character, '=', is used to signify a special processing function, which I will describe later.)
The encoding process causes 24-bit groups, each representing three eight-bit data values, to be represented as output groups of four encoded characters that are derived from a base64 alphabet.

Concatenate, subdivide, and translate

Proceeding from left to right, a 24-bit input group is formed by concatenating three 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single character in the base64 alphabet.
(Table 1, which I will present later, shows the base64 alphabet.)
The order of the bits is important

The input bit stream must be ordered with the most significant bit first.  The first bit in the stream must be the high order bit in the first byte, and the eighth bit must be the low order bit in the first byte, etc.

The base64 alphabet

The base64 alphabet is made up of 64 printable characters plus the equal '=' character.  The '=' character is used as a pad when the number of input bytes is not evenly divisible by three, and therefore doesn't produce a number of output characters that is evenly divisible by four.
(For example, the four input bytes represented by the eight-bit characters klmn produce the following six output characters plus two pad characters: a2xtbg==.  In addition, the output stream of characters is terminated by a carriage return character and a line feed character.)
The base64 alphabet

The base64 alphabet is shown in Table 1.  Whenever the value of a six-bit group matches one of the values in the Value columns in Table 1, that value is replaced by the seven-bit ASCII value of the corresponding character shown in the Char column to the right of the Value column.

Value
Char
Value
Char
Value
Char
Value
Char
0
A
17
R
34
i
51
z
1
B
18
S
35
j
52
0
2
C
19
T
36
k
53
1
3
D
20
U
37
l
54
2
4
E
21
V
38
m
55
3
5
F
22
W
39
n
56
4
6
G
23
X
40
o
57
5
7
H
24
Y
41
p
58
6
8
I
25
Z
42
q
59
7
9
J
26
a
43
r
60
8
10
K
27
b
44
s
61
9
11
L
28
c
45
t
62
+
12
M
29
d
46
u
63
/
13
N
30
e
47
v
pad
=
14
O
31
f
48
w


15
P
32
g
49
x


16
Q
33
h
50
y



Table 1: The base64 alphabet

Encoding and decoding example from Table 1

For example, when encoding, the six-bit value of zero is replaced in the base64 output by a value of 65, which is the seven-bit ASCII value that represents the character A.

When decoding, the base64 character A is replaced by a six-bit group of bits with a value of zero.

Line length limitations

According to RFC 1521, the output stream of encoded characters must be represented in lines of no more than 76 characters each.

As you will see later, the Sun encoding software accepts input data as an array of eight-bit bytes.  The output stream is always terminated by a carriage return and a line feed.  If the number of bytes in the input array produces more than 76 characters in the output stream, each group of 76 output characters is terminated by a carriage return and a line feed, and the final partial line, if any, is also terminated by a carriage return and a line feed.

The final line also includes pad characters, if necessary, to guarantee that the total number of base64 characters in the output is evenly divisible by four.

The pad character

Here is part of what the author of RFC 1521 has to say about the use of the pad characte
r:
"Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a body. When fewer than 24 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed using the '=' character."
Three possible cases regarding padding

The author of RFC 1521 goes on to tell us:
"Since all base64 input is an integral number of octets, only the following cases can arise:
  1. the final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no '=' padding,
  2. the final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two '=' padding characters, or
  3. the final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one '=' padding character."

Program Code

Two different programs

I am going to present and discuss two different programs in this lesson.  I will begin with a program named Base64_02.java.  The sole purpose of this program is to illustrate the base64 encoding and decoding algorithms in a very simple setting.

Next, I will present a program that explains the use of encoding and decoding classes and methods in an undocumented Sun package named sun.misc.  Along with that discussion, I will also point you to alternative documented resources for encoding and decoding base64.

In a future lesson, I will explain several methods that are incorporated into the BigDog set of programs that is designed to protect your email inbox from viruses and spam.

The program named Base64_02

This program is not intended for production use.  Rather, it is intended solely to illustrate the encoding and decoding algorithms for base64.  I will point you to programs that are intended for production use later.

Not fully tested

Note that this program has not been fully tested.  Don't use it for any significant purpose without first testing the conversion to base64 for all possible values in a group of three eight-bit bytes.

Documented encoding and decoding classes

For documented software that you can use to encode and decode base64, see the following encoder and decoder classes.  I haven't tested these programs, but I am assuming that they are correct.  They are published on the excellent web site of
Professor Douglas Lyon , who provides the source code for dozens of different algorithms including those used to encode and decode base64.

Undocumented Sun classes

If you are willing to use undocumented Sun classes to encode and decode base64, you can use the encodeBuffer method of the sun.misc.BASE64Encoder class and the decodeBuffer method of the sun.misc.BASE64Decoder class.  I will show you how to use these methods in the next program in this lesson.  For now, however, let's get back to the discussion of the program named Base64_02.

This program was tested using SDK 1.4.2 under WinXP.

Will discuss in fragments

As usual, I will discuss the program in fragments.  A complete listing is provided in Listing 19 near the end of the lesson.
The first program fragment begins in Listing 1. 

class Base64_02 {

public static void main(String[] args) {

byte[] rawData = "klm".getBytes();
showData(rawData);

Listing 1

Listing 1 shows the beginning of the main method, which creates a byte array object containing three eight-bit characters, and passes the array to a method named showData for display.
(Each of the eight-bit characters in the array consists of the least significant eight bits of the sixteen-bit Unicode character contained in the String "klm".)
The showData method

The showData method displays the data in an incoming byte array as character data and also as binary data.  The showData method is shown in its entirety in Listing 2.
(Note that if there are more than four bytes in the incoming array, the binary data will not be correct.  Bits will have been lost on the most significant end.  Note also that leading zeros are not displayed in the binary data.)

  static void showData(byte[] data){
int save = 0;
for(int cnt = 0; cnt < data.length; cnt++){
System.out.print((char)data[cnt]);
save = (save << 8) | data[cnt];
}//end for loop
System.out.println();
System.out.println(
Integer.toBinaryString(save));
}//end showData

Listing 2

Process using a for loop

The showData method processes the incoming array using a for loop based on the length of the array.  One of the bytes is displayed during each iteration of the for loop in Listing 2.  The byte is cast to type char to cause it to be displayed as a character.

Also, a binary shift operation is used to construct an int value containing shifted versions of each the bytes in the incoming array during successive iterations of the for loop.

Shift eight bits during each iteration

During each iteration of the loop, the current contents of the int variable named save are shifted eight bits to the left, and the next data byte from the incoming array is placed in the least significant eight bits of the variable.
(As mentioned above, if there are more than four bytes in the array, byte data will be shifted off the most significant end of the variable, and the data will be corrupted.)
Display the binary value

After all of the bytes in the array have been processed, the method named toBinaryString, which is a class variable of the Integer class, is used to display the contents of the variable named save as a binary value.
(As mentioned above, this method does not display leading zeros on the most significant end of the binary value.)
The output

Figure 1 shows the output produced by this method when called from the code in Listing 1.

klm
1101011 01101100 01101101

Figure 1

As you can see, the three letters in the first line of output correspond to the characters represented by each of the bytes in the incoming array.

The binary bits represented by the 1's and 0's in the second line correspond to the binary bits in each of the bytes in the incoming array after the bytes have been concatenated.
(Note that I manually inserted spaces in the second line in Figure 1 to separate the bits into eight-bit groups.  This makes it easier to analyze the visually.)
What do the bits represent?

The eight bits on the right correspond to the least significant eight bits in the character 'm'.

The seven bits on the left correspond to the bits in the character 'k', with the left-most zero bit not being displayed.

The remaining bits in the middle correspond to the bits in the character 'l'.
(Note that is a lower-case L, not a numeric 1.)
We will be working with the binary output in Figure 1 later.

Encode and display the data

Now let's return to our discussion of the main method.  The first statement in Listing 3 passes the array object containing the raw data to the method named encoder.  The purpose of the encoder method is to encode the three eight-bit bytes as four six-bit characters.  This method returns a four-element array containing the four six-bit characters in the least significant six bits of four eight-bit bytes.

The second statement in Listing 3 passes the array containing the four base64 characters to the showData method for display in both character and binary format.

    byte[] encodedData = encoder(rawData);
showData(encodedData);

Listing 3

Do it by hand

Before getting into the details of the encoder method, let's walk through our example and perform the encoding from eight bit to base64 manually.

The first two lines in Figure 2 shows the data from Figure 1.  This time however, I manually inserted space characters in the second line of Figure 2 to separate the bits into six-bit groups (instead of eight-bit groups as before), and manually added the missing zero bit on the left.

klm
011010 110110 110001 101101
26 54 49 45
a 2 x t

Figure 2

Mapping into the base64 alphabet

The third line in Figure 2 shows the decimal equivalent value of each of the six-bit groups in the second line.

The fourth line in Figure 2 shows the base64 alphabet character corresponding to each of the decimal equivalent values, taken from Table 1.

Thus, the four-character base64 encoding of the string "klm" is "a2xt".  This is what we should expect the encoder method to return when we pass it an array object containing the eight-bit characters 'k', 'l', and 'm'.

The method named encoder

The beginning of the encoder method is shown in Listing 4.  This method is designed to encode a group of three eight-bit bytes into four six-bit characters from the base64 alphabet.  Because this method is being called from the main method, it must be declared static.

The code in Listing 4 simply confirms that the size of the incoming array is correct, and aborts the program if it is not correct.

  static byte[] encoder(byte[] data){
if(data.length != 3){
System.out.println("Incorrect length");
System.exit(0);
}//end if

Listing 4

Concatenate the bytes

There are probably many ways to accomplish the encoding.  I elected to begin by concatenating the three bytes contained in the incoming array object into the least significant 24 bits of a variable of type int.

    int concat = (data[0]<<16) | (data[1]<<8)
| (data[2]);

Listing 5

I concatenated the bits using the binary shift capability of Java in conjunction with the bitwise or operator.  This is essentially the same thing that was done in the showData method, except that in this case, the number of bytes is always three and therefore, there is no need to use a loop.

Concatenating the eight-bit bytes into a sequence of twenty-four bits makes it relatively easy to separate the twenty-four bits into four groups of six bits each.

Instantiate an output array

Listing 6 instantiates a four-element byte array that will be populated and returned containing the four base64 characters in the least significant seven bits of each eight-bit array element.

    byte[] output = new byte[4];

Listing 6

Separate and map the bits to base64 characters

The method that is actually used to map the values of each group of six bits is named mapTo.  I will discuss the behavior of that method shortly.

Each of the four statements in Listing 7 extracts one group of six bits from the sequence of twenty-four bits and passes that group to the method named mapTo.  The return values from mapTo are used to populate the output array.

    output[3] = (byte)(mapTo(concat & '\u003f'));
output[2] = (byte)(mapTo((concat >> 6)
& '\u003f'));
output[1] = (byte)(mapTo((concat >> 12)
& '\u003f'));
output[0] = (byte)(mapTo((concat >> 18)
& '\u003f'));
return output;
}//end encoder

Listing 7

Note that the output array is populated in reverse order.  In other words, the right-most six bits in the twenty-four bit sequence are used to populate the last element in the array, while the left-most six bits are used to populate the first element in the array.

Shift right and mask

In case you are unfamiliar with the code in Listing 7, each statement (except the first) shifts a group of six bits into the rightmost six bits.  The first statement doesn't need to perform a shift because the six bits of interest are already in the rightmost six-bit position.  Then each statement performs a bitwise and operation with the following bit mask to convert all bits except the rightmost six to zeros:

00000000000000000000000000111111


The method named mapTo

The method named mapTo is shown in Listing 8.  This method maps the value of the least significant six bits of an incoming int value to the corresponding seven-bit character from the base64 alphabet shown in Table 1.

  static int mapTo(int val){
int returnVal = 0;
if(val == 63){
returnVal = '/';
}else if(val == 62){
returnVal = '+';
}else if((val >= 52) && (val <= 61)){
returnVal = '0' + val - 52;
}else if((val >= 0) && (val <= 25)){
returnVal = 'A' + val;
}else if((val >= 26) && (val <= 51)){
returnVal = 'a' + val - 26;
}else{
System.out.println(
"Not a possible six-bit value");
System.exit(0);
}//end else
return returnVal;
}//end mapTo

Listing 8

Alternative approaches

One obvious way to accomplish this would have been to create a Vector object containing the values corresponding to the 64 characters in Table 1.  Then the value of the six-bit group could be used as an index into the Vector object to retrieve the base64 character corresponding to that value.

However, that would have required me to populate the Vector object, which in the worst case would have required me to write 64 statements.  I could have reduced the amount of code by breaking the problem down into the ranges of values shown in Table 2 and using a for loop to populate the Vector object for each range, but this still would have required more code than I wanted to write.

Value Range
Character Range
0-25 A through Z
26-51 a through z
52-61 0 through 9
62 +
63 /

Table 2: Value range versus character range

Conversion on the fly

Therefore, I elected to use a somewhat different approach that computes the required character on the fly rather than using a table lookup.  My approach is shown in Listing 8, and should not require a detailed explanation.

The output

The screen output produced by the two statements in the main method of Listing 3 is shown in Figure 3. However, I manually inserted space characters in the binary representation in Figure 3 to visually separate the bits into eight-bit groups.

a2xt
1100001 00110010 01111000 01110100

Figure 3

The first line of text in Figure 3 shows the base64 characters returned to represent the eight-bit input characters given by 'k', 'l', and 'm'.  As you can see, these four base64 characters match the characters that we identified via manual table lookup in Figure 2.

Figure 3 also shows the binary representation of this sequence of four characters.  Each character is represented by eight consecutive bits, with a leading zero missing on the left end.

Most significant bit is always zero

If you start on the right and count bits, you will find that the most significant bit in each group of eight bits has a value of zero.  Therefore, the most significant bit can be discarded in order to transmit these characters through a transmission system that is limited to seven bits.  No loss of information would result from discarding the most significant bit.

Decode and display

Listing 9 shows the end of the main method.

    byte[] decodedData = decoder(encodedData);
showData(decodedData);

}//end main

Listing 9

The code in Listing 9 passes the array of encoded data to the method named decoder, which returns an array containing decoded data.  The decoded data is stored in an array object of type byte referred to by the reference variable named decodedData.

Then the code in Listing 9 passes the array containing decoded data to the method named showData where it is displayed in both character and binary form.

The method named decoder

The method named decoder is used to decode a group of four base64 characters into three eight-bit bytes.  The method begins in Listing 10.

  static byte[] decoder(byte[] data){
if(data.length != 4){
System.out.println("Incorrect length");
System.exit(0);
}//end if

Listing 10

The decoder method begins by confirming that the incoming array is of the correct length, and terminating the program if it is of the wrong length.

Steps in the process

This method accomplishes its purpose by performing the following steps:
  • Convert the base64 characters back to the original six-bit values according to the relationship between characters and values given in Table 1.
  • Concatenate the four six-bit values into a 24-bit int value in a variable named concat.
  • Separate the 24-bit int value into three eight-bit values that represent the decoded data values.
Convert and concatenate

The first two steps in this process are accomplished by the code in Listing 11.

    int concat = ((mapFrom(data[0]))<<18)
| ((mapFrom(data[1]))<<12)
| ((mapFrom(data[2]))<<6)
| mapFrom(data[3]);

Listing 11

The code in Listing 11 invokes the mapFrom method to convert each base64 character to the corresponding value from Table 1.  Although the values are returned from the mapFrom method as eight-bit values, the maximum possible value cannot be greater than 63.  Therefore, the two most significant bits in the values returned from the mapFrom method are guaranteed to be zero.

Why is this important?

This is very important because the two most significant bits of each eight-bit value overlap the two least significant bits of the value previously shifted six bits to the left in Listing 11.  Because a bitwise inclusive or is used to combine the values, the two most significant bits having a value of zero cannot interfere with the values of the two bits that they overlap.

I will have more to say about the method named mapFrom shortly.

Extract the eight-bit data

The code in Listing 12 accomplishes the third step in the above list of three steps.

    byte[] output = new byte[3];
output[2] = (byte)((concat & '\u00ff'));
output[1] = (byte)(((concat >> 8)
& '\u00ff'));
output[0] = (byte)(((concat >> 16)
& '\u00ff'));
return output;
}//end decoder

Listing 12

This code extracts the three eight-bit values from the twenty-four bits stored in the int variable named concat, and uses those bits to populate the individual bytes in the output byte array.  This code should be self-explanatory.

The method named mapFrom

The method named mapFrom is used to convert from a base64 character to a six-bit value using the relationships between characters and values given in Table 1.  The method is shown in Listing 13.

  static int mapFrom(int val){
int returnVal = 0;
if(val == '/'){
returnVal = 63;
}else if(val == '+'){
returnVal = 62;
}else if((val >= '0') && (val <= '9')){
returnVal = 52 + val - '0';
}else if((val >= 'A') && (val <= 'Z')){
returnVal = 0 + val - 'A';
}else if((val >= 'a') && (val <= 'z')){
returnVal = 26 + val - 'a';
}else{
System.out.println(
"Not a possible six-bit value");
System.exit(0);
}//end else
return returnVal;
}//end mapFrom

Listing 13

Reverses the earlier process

The method named mapFrom shown in Listing 13 essentially reverses the process provided by the method named mapTo that was shown in Listing 8.

The code in Listing 13 should be self-explanatory and should not require further explanation.

The Output

The total output from this program is shown in Figure 4.
(Note that I manually added the missing bits with a zero value on the left end.  I also inserted spaces to separate the data into eight-bit groups.)
The first four lines of text in Figure 4 repeat what you have already seen in Figure 1 and Figure 3.

klm
01101011 01101100 01101101
a2xt
01100001 00110010 01111000 01110100
klm
01101011 01101100 01101101

Figure 4

The decoded output

This program performs the following steps:
  • Create and display three eight-bit characters in both character and binary format.
  • Encode the three eight-bit characters into four characters from the base64 alphabet.  Display those characters in both character and binary format.
  • Decode the four base64 characters back into three eight-bit characters.  Display those characters in both character and binary format.
The last two lines of text in Figure 4 show the result of decoding and displaying the base64 data according to the code in Listing 9.  As you can see, the final result matches the starting data represented by the first two lines of text in Figure 4.

The first and fifth lines of text in Figure 4 each represent the same three eight-bit bytes of data.  The third line shows the four seven-bit base64 characters that represent those three eight-bit bytes of data.
(Each seven-bit data value is actually stored in an eight-bit byte.  However, the most significant bit is always zero.  Therefore, it could be discarded without loss of information.)
Documented production software

The program named Base64_02 is provided solely to illustrate the conversion algorithms to and from base64.  It is not suitable for production use because it doesn't deal with several of the issues defined in the document entitled Mechanisms for Specifying and Describing the Format of Internet Message Bodies.

For example, the program is incapable of dealing with the situation where the number of eight-bit bytes is not evenly divisible by three.  In that case, the algorithm must append pad characters consisting of '=' characters to guarantee that the number of characters in the base64 data is evenly divisible by four.

Also, the code in the program named Base64_02 does not deal with the issue having to do with a maximum line length of 76 characters for the base64 data.

As of this writing, documented classes suitable for production use are available from which you can compile encoder and decoder objects.

Undocumented production software

If you are willing to use undocumented software, J2SE SDK version 1.4.2 contains undocumented classes from Sun that you can use to compile encoder and decoder objects.

Listing 20 near the end of the lesson presents a program named Base64_03 that illustrates the use of these classes.

The program named Base64_03

This program illustrates the use of the undocumented sun.misc package for encoding and decoding base64.

This program uses the encodeBuffer method of the sun.misc.BASE64Encoder class and the decodeBuffer method of the sun.misc.BASE64Decoder class.

Information based on introspection

Introspection shows that the BASE64Encoder class inherits the following two methods from the sun.misc.CharacterEncoder class, possibly overriding one or both:
  • encode
  • encodeBuffer
This program illustrates the use of the encodeBuffer method.  I have been unable to find any information on the encode method.

The sun.misc.CharacterEncoder class is a direct subclass of Object.

Introspection also shows that the BASE64Decoder class inherits the decodeBuffer method from the sun.misc.CharacterDecoder class, possibly overriding the method:

The sun.misc.CharacterDecoder class is a direct subclass of Object.

No documentation available

I have never been able to find any documentation on the use of either of these classes.  I can't remember how I learned how to use them.  However, I will show you what I know.

This program was tested using SDK 1.4.2 under WinXP.

The main method

The main method begins in Listing 14.

  public static void main(String[] args) {
byte[] dataBuffer = "klmn".getBytes();
System.out.println(dataBuffer.length);
System.out.println(new String(dataBuffer));

Listing 14

The code in Listing 14
  • Creates a byte array containing the eight-bit representations of four characters.
  • Displays the length of the array.
  • Displays the contents of the array.
This array will be used as the input to the base64 encoding process.  I purposely caused this array to contain a number of bytes that is not evenly divisible by three to illustrate the use of the pad character '=' in the base64 encoding process.

The output

As you might expect, the code in Listing 14 produces the screen output shown in Figure 6.

4
klmn

Figure 6

Encode the data as base64

Continuing with the main method, the code in Listing 15:
  • Invokes the method named encodeBase64 to encode the four eight-bit bytes into base64 characters, returning the encoded data as a String object.
  • Displays the length of the string of base64 characters.
  • Displays the characters in the string.

    String encoded = encodeBase64(dataBuffer);
System.out.println(encoded.length());
System.out.println(encoded);

Listing 15

The output

The code in Listing 15, plus the remaining code in the main method (which I will discuss shortly) produces the output shown in Figure 7.

10
a2xtbg==

4
klmn

Figure 7

Two important points

I included all of the output in Figure 7 to illustrate two important points.

Recall that the previous program converted the eight-bit representations of the characters in the string "klm" to the four seven-bit base64 characters represented by "a2xt" (see Listing 1 and Figure 3).

This program converts the eight-bit representations of the four characters in the string "klmn" to the eight seven-bit base64 characters represented by "a2xtbg==" as shown in Figure 7.

Note the pad characters

Note in particular the use of the pad character "=" at the end of the output string to guarantee that the number of base64 characters is evenly divisible by four.

Note the length of the output string

Also note that the output in Figure 7 reports the number of characters in the string of base64 characters to be 10 instead of 8.  This is because the Sun encoder used to perform the conversion to base64 always appends a carriage return and a line feed onto the end of the string of base64 characters.  You can see the evidence of this by the blank line between the second and third lines of text in the output shown in Figure 7.

What if the input exceeds 57 bytes?

If the number of eight-bit bytes passed to the encoder exceeds 57, the encoder will return multiple lines of base64 characters.  Each returned line is 76 characters in length plus a carriage return and a line feed appended onto the end of each line.

Thus the actual number of characters returned for each line other than the last line will be 78 characters.  The last line will contain the base64 characters that represent the leftover eight-bit characters plus a carriage return and a line feed.

Output for multiple lines of base64 data

This is illustrated by the program output in Figure 8, which shows an input consisting of 58 eight-bit bytes and a base64 output containing a total of 84 bytes.  The 84 bytes include two sets of carriage return and line feed characters.

58
1234567890123456789012345678901234567890123456789012345678
84
MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ
OA==

58
1234567890123456789012345678901234567890123456789012345678


Figure 8

(The output in Figure 8 was produced by modifying the first statement in Listing 14 to contain the 58-character string shown in the second line of Figure.)
The encodeBase64 method

I am going to set the main method aside while I discuss the method named encodeBase64.  I will return to the main method later.

The encodeBase64 method, which is used to encode an array of eight-bit bytes into a string of base64 characters, is shown in its entirety in Listing 16.

  static String encodeBase64(byte[] data){
sun.misc.BASE64Encoder encoder =
new sun.misc.BASE64Encoder();
return encoder.encodeBuffer(data);
}//end base64Display()

Listing 16

The Sun encodeBuffer method

The code in Listing 16 instantiates an object of the undocumented class named sun.misc.BASE64Encoder, and invokes the encodeBuffer method on that object, passing the array of eight-bit bytes as a parameter.

The encodeBuffer method converts the bytes in the incoming array to seven-bit base64 characters.  Each of the base64 characters is encapsulated in the least significant seven bits of a character in a Java String object, which is returned by the method.  Thus, the returned base64 characters are encapsulated in the Unicode characters that comprise a Java String object.

As described earlier, the returned string includes a carriage return and a line feed at the end.  If the number of base64 characters exceeds 76 characters, the string contains multiple lines with each line terminated by a carriage return and a line feed.

If the number of input eight-bit characters is not evenly divisible by three, the encoder appends the base64 pad character '=' at the end to guarantee that the number of base64 characters is evenly divisible by four.

Decode the data

Returning now to the discussion of the main method, the code in Listing 17:
  • Invokes the decodeBase64 method to convert the encoded base64 data back to eight-bit bytes.
  • Displays the number of eight-bit bytes.
  • Displays the values of the eight-bit bytes.

    String decoded = decodeBase64(encoded);
System.out.println(decoded.length());
System.out.println(decoded);

}//end main

Listing 17

Examples of the output produced by the code in Listing 17 are shown in Figure 7 and Figure 8.  As expected, the output produced by decoding the base64 data matches the input that was encoded into base64 data earlier.

The decodeBase64 method

The decodeBase64 method is shown in Listing 18.

  static String decodeBase64(String encoded){
String decoded = "";
try{
sun.misc.BASE64Decoder decoder =
new sun.misc.BASE64Decoder();
decoded = new String(decoder.decodeBuffer(
encoded));
}catch(Exception e){e.printStackTrace();};
return decoded;
}//end decodedBase64

Listing 18

The method instantiates an object of the undocumented class named sun.misc.BASE64Decoder.  Then it invokes the decodeBuffer method on that object passing the encoded data as a parameter.

The decodeBuffer method

The decodeBuffer method converts the base64 characters into the corresponding set of eight-bit values.  Although it isn't obvious in Listing 18, the decodeBuffer method returns an array object of type byte with each element in the array containing one of the resulting eight-bit values.

The code in Listing 18 encapsulates each of the resulting eight-bit values in the least significant eight bits of the Unicode characters that make up a Java String object, and returns that string.

Run the Programs

I encourage you to copy, compile and run the code in Listing 19 and Listing 20.  Modify it and experiment with it until you fully understand it.

Summary

This lesson explains the use of base64 encoding and decoding in general, and illustrates base64 encoding and decoding using sample programs.

What's Next?

A future lesson will explain how base64 decoding is used in the BigDog program.

Program Listings

Complete listings of the two programs explained in this lesson are provided in Listing 19 and Listing 20.

/*File Base64_02.java Copyright 2004, R.G.Baldwin
Rev 03/29/04

This program illustrates the algorithm for
encoding and decoding base64. It is not intended
for production use. Rather it is intended solely
to illustrate the algorithm.

CAUTION: This program has not been fully tested.
Don't use it for any significant purpose without
first testing the conversion to base64 for all
possible values in a group of three eight-bit
bytes.

For information on the base64 encoding algorithm,
see the following URL:
http://www.cse.ohio-state.edu/cgi-bin/rfc/
rfc1521.html#sec-5.2

For production software to encode and decode
base64, see the following URLs:

http://show.docjava.com:8086/book/cgij/doc/net/
proxy/BASE64Encoder.java.html

http://show.docjava.com:8086/book/cgij/doc/net/
proxy/BASE64Decoder.java.html

As an alternative to the above see the
encodeBuffer method of the
sun.misc.BASE64Encoder class and the
decodeBuffer method of the
sun.misc.BASE64Decoder class

Tested using SDK 1.4.2 under WinXP.
************************************************/

class Base64_02 {

public static void main(String[] args) {

//Create and display a byte array containing
// three bytes of 8-bit character data
byte[] rawData = "klm".getBytes();
showData(rawData);

//Encode as base64 and display
byte[] encodedData = encoder(rawData);
showData(encodedData);

//Decode and display
byte[] decodedData = decoder(encodedData);
showData(decodedData);

}//end main
//-------------------------------------------//

//Method to encode a group of three 8-bit
// bytes into four base64-format characters.
static byte[] encoder(byte[] data){
if(data.length != 3){
System.out.println("Incorrect length");
System.exit(0);
}//end if

//Concatenate the bytes into a single
// positive int value.
int concat = (data[0]<<16) | (data[1]<<8)
| (data[2]);

//Extract the data from the int value six
// bits at a time and map each group of six
// bits into the corresponding base64
// character.
byte[] output = new byte[4];
output[3] = (byte)(mapTo(concat & '\u003f'));
output[2] = (byte)(mapTo((concat >> 6)
& '\u003f'));
output[1] = (byte)(mapTo((concat >> 12)
& '\u003f'));
output[0] = (byte)(mapTo((concat >> 18)
& '\u003f'));
return output;
}//end encoder
//-------------------------------------------//

/*Method to map a six-bit value into a base64
character. See a definition of the mapping
requirement at the following URL:
http://www.cse.ohio-state.edu/cgi-bin/rfc/
rfc1521.html#sec-5.2
*/
static int mapTo(int val){
int returnVal = 0;
if(val == 63){
returnVal = '/';
}else if(val == 62){
returnVal = '+';
}else if((val >= 52) && (val <= 61)){
returnVal = '0' + val - 52;
}else if((val >= 0) && (val <= 25)){
returnVal = 'A' + val;
}else if((val >= 26) && (val <= 51)){
returnVal = 'a' + val - 26;
}else{
System.out.println(
"Not a possible six-bit value");
System.exit(0);
}//end else
return returnVal;
}//end mapTo
//-------------------------------------------//

//Method to decode a group of four characters
// in base64 format into three 8-bit bytes.
static byte[] decoder(byte[] data){
if(data.length != 4){
System.out.println("Incorrect length");
System.exit(0);
}//end if

//Concatenate the four characters into a
// single positive int value. Map from the
// base64 characters into the original
// six-bit values before concatenating.
int concat = ((mapFrom(data[0]))<<18)
| ((mapFrom(data[1]))<<12)
| ((mapFrom(data[2]))<<6)
| mapFrom(data[3]);

//Extract the data from the int value eight
// bits at a time.
byte[] output = new byte[3];
output[2] = (byte)((concat & '\u00ff'));
output[1] = (byte)(((concat >> 8)
& '\u00ff'));
output[0] = (byte)(((concat >> 16)
& '\u00ff'));
return output;
}//end decoder
//-------------------------------------------//

//Method to map from a base64 character into a
// six-bit value. This method reverses the
// process provided by the mapTo method.
static int mapFrom(int val){
int returnVal = 0;
if(val == '/'){
returnVal = 63;
}else if(val == '+'){
returnVal = 62;
}else if((val >= '0') && (val <= '9')){
returnVal = 52 + val - '0';
}else if((val >= 'A') && (val <= 'Z')){
returnVal = 0 + val - 'A';
}else if((val >= 'a') && (val <= 'z')){
returnVal = 26 + val - 'a';
}else{
System.out.println(
"Not a possible six-bit value");
System.exit(0);
}//end else
return returnVal;
}//end mapFrom
//-------------------------------------------//

//Method to display the data in a byte array
// as character data and as binary data.
//Caution, if there are more than four bytes
// in the array, the binary data will not be
// correct. Also note that leading zeros are
// not displayed.
static void showData(byte[] data){
int save = 0;
for(int cnt = 0; cnt < data.length; cnt++){
System.out.print((char)data[cnt]);
save = (save << 8) | data[cnt];
}//end for loop
System.out.println();
System.out.println(
Integer.toBinaryString(save));
}//end showData

}//end class Base64_02

Listing 19


/*File Base64_03.java Copyright 2004, R.G.Baldwin
Revised 03/29/04

This program illustrates the use of the
sun.misc package for encoding and decoding
base64.

For information on the base64 encoding algorithm,
see the following URL:
http://www.cse.ohio-state.edu/cgi-bin/rfc/
rfc1521.html#sec-5.2

For production software to encode and decode
base64, see the following URLs:

http://show.docjava.com:8086/book/cgij/doc/net/
proxy/BASE64Encoder.java.html

http://show.docjava.com:8086/book/cgij/doc/net/
proxy/BASE64Decoder.java.html

As an alternative to the above this program uses
the encodeBuffer method of the
sun.misc.BASE64Encoder class and the
decodeBuffer method of the
sun.misc.BASE64Decoder class

Introspection shows that the BASE64Encoder class
inherits the following two methods from the
sun.misc.CharacterEncoder class, possibly
overriding one or both:
encode
encodeBuffer
The sun.misc.CharacterEncoder class is a subclass
of Object.

Introspection shows that the BASE64Decoder class
inherits the following method from the
sun.misc.CharacterDecoder class, possibly
overriding the method:
decodeBuffer
The sun.misc.CharacterDecoder class is a subclass
of Object.

Tested using SDK 1.4.2 under WinXP.
************************************************/

class Base64_03 {

public static void main(String[] args) {
//Create a byte array containing data and
// display its length.
byte[] dataBuffer = "klmn".getBytes();
System.out.println(dataBuffer.length);

//Display the contents of the byte array
System.out.println(new String(dataBuffer));

//Encode as base64 and display
String encoded = encodeBase64(dataBuffer);
System.out.println(encoded.length());
System.out.println(encoded);

//Decode and display
String decoded = decodeBase64(encoded);
System.out.println(decoded.length());
System.out.println(decoded);

}//end main
//-------------------------------------------//

//Method to encode an array of bytes into
// base64 format
static String encodeBase64(byte[] data){
sun.misc.BASE64Encoder encoder =
new sun.misc.BASE64Encoder();
return encoder.encodeBuffer(data);
}//end base64Display()
//-------------------------------------------//

//Method to decode a base64 string
static String decodeBase64(String encoded){
String decoded = "";
try{
sun.misc.BASE64Decoder decoder =
new sun.misc.BASE64Decoder();
decoded = new String(decoder.decodeBuffer(
encoded));
}catch(Exception e){e.printStackTrace();};
return decoded;
}//end decodedBase64

}//end class Base64_03

Listing 20



Copyright 2004, Richard G. Baldwin.  Reproduction in whole or in part in any form or medium without express written permission from Richard Baldwin is prohibited.

About the author

Richard Baldwin is a college professor (at Austin Community College in Austin, TX) and private consultant whose primary focus is a combination of Java, C#, and XML. In addition to the many platform and/or language independent benefits of Java and C# applications, he believes that a combination of Java, C#, and XML will become the primary driving force in the delivery of structured information on the Web.

Richard has participated in numerous consulting projects, and he frequently provides onsite training at the high-tech companies located in and around Austin, Texas.  He is the author of Baldwin's Programming Tutorials, which has gained a worldwide following among experienced and aspiring programmers. He has also published articles in JavaPro magazine.

Richard holds an MSEE degree from Southern Methodist University and has many years of experience in the application of computer technology to real-world problems.

Baldwin@DickBaldwin.com

-end-
 






Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel