Architecture & DesignJava Language Integrity & Security: Uncovering Bytecodes

Java Language Integrity & Security: Uncovering Bytecodes

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

This series, The Object-Oriented Thought Process, is intended for someone just learning an object-oriented language and who wants to understand the basic concepts before jumping into the code, or someone who wants to understand the infrastructure behind an object-oriented language he or she is already using. These concepts are part of the foundation that any programmer will need to make the paradigm shift from procedural programming to object-oriented programming.

Figure 1: Statically Linked Applications

Note that the link process can accept multiple inputs, not just a single object module. This is what the term link means, a executable module can contain code that is ‘linked’ together from various places. For example, besides an object module produced from a single file, a developer can ‘link’ other modules, including those produced by other developers as well as libraries, possibly from third party vendors.

The other term that is pertinent here is ‘statically.’ All of the linked modules produce an executable that is static, not dynamic, as are the examples you will explore later. A search on Google finds a definition for statically linked as follows:

Definitions of statically linked on the Web:

Linked as a physical part of an executable file. The linkage between calls and subprograms is completely fixed at link time. See dynamically linked.

Figure 2: Statically Linked Java Applications

The issue here is that these are Microsoft Windows applications only. You could not copy this version of javac.exe and run it directly on a UNIX machine.

Obviously, there is a Java language specification. Although the Microsoft version of java.exe will run only on a Windows platform, the java compiler on UNIX and other platforms must abide by the same Java language specification. Despite the fact that each individual platform has its own non-portable java compiler, they all, theoretically at least, behave in consistent ways. Thus, a Java program on a Windows platform should run unchanged on a UNIX platform—even though the tools themselves were written on different platforms by different developers.

As already stated, the Microsoft javac.exe file contains Microsoft Windows-specific machine code. It is interesting to try and open an executable file with a text editor. This is actually a meaningless operation in this context, except that you will be able to compare it to a file of bytecodes and it provides a baseline for this discussion. When you open the javac.exe file in Notepad, you get the results seen in Figure 3.

Figure 3: javac.exe opened in Notepad.

Obviously, this exercise provides us no useful information. However, I always find it interesting to take a look at this type of output. It also provides students a way to differentiate between character and binary files. Because Notepad is a text editor, the characters displayed are the ACSII representation of the file. Perhaps the most interesting thing about the display of this file is that there are no recognizable words—at least none that I can determine. That is not the case when you look at a file of bytecodes in the same way.

Dynamically Linked Executables

The bytecode model uses a different approach. In this case, if you don’t want the kitchen sink, you won’t get it. The corresponding definition of dynamically linked is:

Definitions of dynamically linked on the Web:

Linked in name only, so that the executable file contains only the information needed to locate the code of a procedure—the name of the module that contains it and the name of the entry point. When the executable program is loaded, the module is also loaded, and the linkage between them is fixed in memory only.

Figure 4: Bytecode model

Under The Hood

Perhaps the best way to explain the bytecode model is to look at it directly. You’ll design a small Java application for this illustration. In this case, you will create a simple application called Performance presented in Listing 1 and use a class called Employee presented in Listing 2.

Listing 1: The Employee Class

public class Performance {

   public static void main(String[] args) {

      System.out.println("Performance Example");

      Employee joe = new Employee();

   }
}

Listing 2: The Employee Class

public class Employee {

   private int employeeNumber;

}

As I normally do with these examples, I compile them from a batch file as seen in Listing 3.

Listing 3: Compiling the Application

cls

"C:Program FilesJavajdk1.5.0_07binjavac" -Xlint -classpath
   .Performance.java

Although I do most of my development with an Integrated Development Environment (IDE), I normally will use batch files like these so that I know my CLASSPATH information is correct. This helps in the instruction phase of programming, and it also assists in the testing of the various versions of the development kits. For example, as was mentioned earlier, a web developer must allow for various platforms while developing and testing. In the same manner, a developer must allow for various versions of a development kit. If Java is the development platform, what version of the SDK should be used? The answer is that all reasonable versions must be tested. This means that multiple versions of the development kit may be installed on a machine at the same time. Therefore,, keeping track of the CLASSPATH is problematic.

To deal with this, I like to use batch files to insure that I am using the version of the development kit that I intend to use. Granted, there are much more sophisticated methods of doing this and there are many development tools available to the professional developer; however, in an academic environment, using a more simple, and inexpensive solution is often desirable.

When this application is compiled, there are two separate class files produced, Performance.class and Employee.class, as seen in Figure 5.

Figure 5: Application Class Files.

Take another look at Figure 3, when you opened the statically linked javac.exe file with Notepad. Open the employee.class file and see what you get. The results, using Notepad, can be seen in Figure 6.

Figure 6: Employee.class.

Again, this exercise provides no real benefit from a technical perspective; however, it does provide a window into the structure of the bytecodes. Primarily, you can see that there are some textual components of the file that are recognizable. The word Employee is clearly identifiable in at least a couple of locations. The reason why this is important is because it hints at the possibility of decoding this file and providing much more information about it. Could you potentially even re-create the original source code?

The thought of re-creating the original source code of the statically linked application is far beyond the reach of most any technology. Yet, is it possible to accomplish this task with bytecodes? The answer to this question is, for the most part. There are many technologies that perform the function of recreating source code from bytecodes, and you will explore this in later articles. For now, you can use something much more accessible, a tool provided by the Java SDK itself: javap.exe.

Figure 7: javap.exe.

The Java documentation identifies javap as the The Java Class File Disassembler. Their definition is as follows:

The javap command disassembles a class file. Its output depends on the options used. If no options are used, javap prints out the package, protected, and public fields and methods of the classes passed to it. javap prints its output to stdout.

Figure 8: The options for javap.exe (using javap – help).

For your examples, you will start with the –private flag to show you all classes and members. The best way to see what javap does is to run the Employee class through it as follows:

Figure 9: Running Employee.class through javap.exe.

It is interesting to look at the original source code and the disassembled source code right next to each other. Take a look at Listing 4 and study the differences.

Listing 4: The Employee Class (original source code and the disassembled source code)

public class Employee {

   private int employeeNumber;

}

public class Employee extends java.lang.Object{
   private int employeeNumber;
   public Employee();
}

There are two obvious differences between the two versions of the code. First, in the disassembled source code, it is apparent that Employee extends the java.lang.Object class. This is expected, because all objects in Java ultimately extends the Object class. However, here is irrefutable proof. The code was not in the original source code. Yet, the compiler has inserted it into the bytecode version. The second obvious difference is the fact that there is a constructor in the midst of the code.

public Employee();

Again, this is as expected. If no constructor is specified in the original code, a default constructor is supposed to be provided for you—and this is exactly what happened here. You can have some fun and see what happens when you do provide a constructor as seen in Listing 5.

Listing 5: The Employee Class (original source code and the disassembled source code)

public class Performance {

   public static void main(String[] args) {

      System.out.println("Performance Example");

      mployee joe = new Employee(1);

   }
public class Employee {

   private int employeeNumber;

   public Employee (int a) {

   }

}

Running this code through javap produces the output in Figure 10.

Figure 10: A Non-Default Constructor.

Notice that the default constructor is gone and is replaced by the constructor that you defined. This is exactly what you would have expected. Once again, the value of this exercise is primarily instructional; however, at times it can be a valuable debugging tool. Finally, put in a second constructor.

Listing 6: The Employee Class (original source code and the disassembled source code)

public class Performance {

   public static void main(String[] args) {

      System.out.println("Performance Example");

      Employee joe = new Employee(1);

   }
}
public class Employee {

   private int employeeNumber;

   public Employee (int a) {

   }

   public Employee (float a) {

   }
}

Now, javap produces the output in Figure 11. Notice that the name of the attribute datatype is not included in the method parameter list—they are only listed as int and float. Also note that in the original source code, both of the parameter names are the same. This leads you to an interesting topic that you will cover extensively in a future article as well.

Figure 11: Two Constructors.

Compiling the Disassembled

One of the interesting questions is whether or not this disassembled code can actually be compiled and used. The easiest way to test this is to use it and compile it. The code, incorporating the resultant code from javap, is shown in Listing 7.

Listing 7: The Employee Class (the disassembled source code)

public class Performance {

   public static void main(String[] args) {

      System.out.println("Performance Example");

      Employee joe = new Employee();

   }
}
public class Employee extends java.lang.Object{
   private int employeeNumber;
   public Employee();
}

The answer to that questions is no—at least not directly. The javap application (at least with the -private option) seems to have provided only the signature of the method—not the body.

Figure 12: Compiled the disassemble Employee source.

If you do add a method body as seen in Listing 8, the dissembled code will work; but, this is cheating. What is the point of disassembling the code if you can’t compile it directly? This is a question you will explore in the next article.

Listing 8: The Employee Class (the disassembled source code)

public class Performance {

   public static void main(String[] args) {

      System.out.println("Performance Example");

      Employee joe = new Employee();

   }
}
public class Employee extends java.lang.Object{
   private int employeeNumber;
   public Employee() { };
}

Conclusion

In this article, you began to explore how a class file is designed and how you can disassemble it. Although there are a few applications of the process that can assist the professional developer, it is often a very good mechanism for instructional purposes. Understanding what goes on under the hood of an application is a beneficial process. At a more detailed level, this exercise provides the framework for the Java Virtual Machine.

In next month’s article, you will delve more deeply into understanding how the structure of bytecodes can help in the construction and testing phases of the software development process and how it affects the performance and security of an application.

References

About the Author

Matt Weisfeld is a faculty member at Cuyahoga Community College (Tri-C) in Cleveland, Ohio. Matt is a member of the Information Technology department, teaching programming languages such as C++, Java, C#, and .NET as well as various Web technologies. Prior to joining Tri-C, Matt spent 20 years in the information technology industry gaining experience in software development, project management, business development, corporate training, and part-time teaching. Matt holds an MS in computer science and an MBA in project management. Besides The Object-Oriented Thought Process, which is now in its second edition, Matt has published two other computer books, and more than a dozen articles in magazines and journals such as Dr. Dobb’s Journal, The C/C++ Users Journal, Software Development Magazine, Java Report, and the international journal Project Management. Matt has presented at conferences throughout the United States and Canada.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories