Java Language Integrity & Security: Uncovering Bytecodes
This series, The Object-Oriented Thought Process, is intended for someone just learning an object-oriented language and who wants to understand the basic concepts before jumping into the code, or someone who wants to understand the infrastructure behind an object-oriented language he or she is already using. These concepts are part of the foundation that any programmer will need to make the paradigm shift from procedural programming to object-oriented programming.
Click here to start at the beginning of the series.
In keeping with the code examples used in the previous articles, Java will be the language used to implement the concepts in code. One of the reasons that I like to use Java is because you can download the Java compiler for personal use at the Sun Microsystems Web site http://java.sun.com/. You can download the standard edition, J2SE 5.0, at http://java.sun.com/j2se/1.5.0/download.jsp to compile and execute these applications. I often reference the Java J2SE 5.0 API documentation and I recommend that you explore the Java API further. Code listings are provided for all examples in this article as well as figures and output (when appropriate). See the first article in this series for detailed descriptions for compiling and running all the code examples.
In the previous column, you explored some of the behaviors of serialization and how it relates to the topics of performance and security. In this article, you will begin an exploration of the bytecodes that are produced when source files are compiled and how this affects performance and security. This path will lead you into some interesting discussions on how the bytecode is interpreted in relation to the Java Virtual Machine (JVM).
The code examples in this series are meant to be a hands-on experience. There are many code listings and figures of the output produced from these code examples. Please boot up your computer and run these exercises as you read through the text.
The bytecode model provides many advantages; however, as always seems to be the case, there are some drawbacks as well. When a compiled language is used, and a statically linked executable is produced, the resulting machine code is quite difficult to reengineer.
Reengineering can mean many things, from re-creating the original design to reproducing the original source code. Although the previous sentence uses the words reengineering, re-create and reproduce, the Java documentation uses another word, disassemble. The Java toolkit actually provides a tool, called javap, for simple disassembly. The term disassemble can raise some eyebrows because at certain levels it is inappropriate; however, you will use the practice here in an instructional sense.
In languages that produce bytecodes, the practice of disassembling code has one goal in mind: Take the bytecodes and reverse-engineer them to produce source code that is effectively identical to the original source code.
Statically Linked Executables
However, decompiling code also has an educational benefit. Please return to the issue of the statically linked language. Languages such as C, C++, and FORTRAN go through a compile/link process that produces what is called a statically linked module. In a MS Windows environment, these models are sometimes referred to as executables and have an .EXE extension. Figure 1 shows the process by which statically linked executables are produced.
Figure 1: Statically Linked Applications
Note that the link process can accept multiple inputs, not just a single object module. This is what the term link means, a executable module can contain code that is 'linked' together from various places. For example, besides an object module produced from a single file, a developer can 'link' other modules, including those produced by other developers as well as libraries, possibly from third party vendors.
The other term that is pertinent here is 'statically.' All of the linked modules produce an executable that is static, not dynamic, as are the examples you will explore later. A search on Google finds a definition for statically linked as follows:
|Definitions of statically linked on the Web:|
Linked as a physical part of an executable file. The linkage between calls and subprograms is completely fixed at link time. See dynamically linked.
The operative part of this definition is the part that says: The linkage between calls and subprograms is completely fixed at link time. It is also interesting to see that the term dynamically linked is part of the definition, as an opposite. The issue is that in a statically linked executable, everything is pre-determined. This has its advantages; it also has its disadvantages. The primary advantage is that everything you need is always there; the disadvantage is that everything you need is always there.
This issue is as the heart of a major hindrance when it comes to size. As you may imagine, sending a large, statically linked executable over a network poses a significant problem. As an example, if you are loading a module over a network, it may be a good idea to only download the functionality that you need. This is one of the problems with a statically linked executable. If everything is part of the package, including the kitchen sink, what happens when you don't need the kitchen sink? The more basic question is, Why send the kitchen sink over the network if you don't even want it?
As you understand, a statically linked executable, such as a Microsoft Windows .EXE file, can be run only on a Windows platform. This is both an advantage and a significant limitation. The lack of portability causes significant problems when it comes to platform-independent applications such as web pages. In fact, a web developer has no clue as to what platform a user is surfing the web with. Thus, a web application must be able to support several different platforms. However, the browser itself is a statically linked application and must be created on each individual platform.
As you have seen, executables contain the machine language of the host machine. Thus, it is not portable across platforms. For example, you can use the Java compiler itself.
Although Java is not a statically linked language (it is actually a dynamically loaded language), the Java tools provided for specific platforms are statically linked executables. If you take a look at the Java installation directory, you can see that the bin directory contains a lot of Windows executables; one of them is the Java compiler, javac.exe. Figure 2 shows a screen shot of the Java executables contained in this directory. You will recognize many of the tools that are used in application development, such as the compiler (javac.exe) the virtual machine (java.exe), and so forth.
Figure 2: Statically Linked Java Applications
The issue here is that these are Microsoft Windows applications only. You could not copy this version of javac.exe and run it directly on a UNIX machine.