JVM (Java Virtual Machine) is an abstract computing model. Just like a real machine, it has its own instruction set, execution engine, and serves to manipulate memory areas at run time. The objective is to provide an execution environment for applications built to run on it. JVM interprets the instruction code and interacts with the underlying layers: the OS platform and hardware architecture for execution of the instructions and resource management. This article gives an overview of the JVM and how a Java program executes within it.
Virtual Machine
Virtual machines are basically non-physical computers built to provide an environment that serves a specific or sometimes a general purpose. It sounds very similar to an Emulator that emulates a hardware component which either is not present in the machine or cannot perform as per requirement. So, what we do is create software that pretends that the specific hardware component is actually present in the system by providing the services provided by the actual hardware in the form of software. Virtual machines use CPU virtualization to am extent, providing an interface to the real hardware issues. So, in essence, both provide a virtual environment or an abstraction of something which it is not. However, differences between them obvious as we dive deeper. Let’s not focus on their differences for now. The point is that they pose something that they are not. In the words of Popek and Goldberg in the article “Formal Requirements for Virtualizable Third Generation Architectures,” it is “an efficient, isolated duplicate of the real machine.”
Virtual Machines have types, induced by their needs and usages. One is called full virtualization, which behaves like a real machine. Others are a little subtler and more specific, like process virtualization. It is difficult to typify JVM to any particular genre because JVM virtualizes a CPU, has its own runtime environment and memory manager that work in collaboration with the underlying platform, garbage collector, and, of course, its band of class libraries infused as an intermediate bytecode and last, but not least, emulates machine registers, stacks, and so forth. In short, it is a playground of the essence of Java called bytecode transformed by the Java compiler. Bytecodes are practically machine code for the JVM which reinterprets them into native machine instructions.
Class File Format
Interestingly, JVM does not care about the Java language or any other programming language with respect to its semantics and syntactical structure. When it comes to executing a program, its primary interest lies in a particular file format called the class file format. The file format *.class has nothing to do with object-oriented class structure defined in Java code. It is a *.java file transformed into a *.class file by the compiler. JVM is ready to interpret class files; it does not matter what compiler is used create it as long as it creates a class file format. The Java compiler compiles a program into its equivalent class files. These class files actually contain half-compiled code called bytecode. It is called half compiled because bytecode is not directly executable, as are binary files created by the C/C++ compiler. It is meant to be fed into the JVM, which, in turn, interacts with the underlying platform to finally execute the instructions. The bytecode thus contains JVM instructions, a symbol table, and other ancillary information. A compiler that can produce bytecode according to the syntactic and structural constraints of the JVM is a candidate to be executed on JVM, irrespective of any language.
JVM Standpoint
JVM places itself between the bytecode and the underlying platform. The platform comprises the operating system (OS) and the hardware. The OS and hardware architecture may vary in multiple machines, but the same Java program that ran on one will run on any other machines without even making a slightest change in the code. This is something unique about the languages that run on a virtual environment. For example, the difference between target code generated by other programming language compilers such as C++ in comparison to Java is that the C++ program needs to be recompiled by platform-specific compilers to make it compatible to run on varied architectures. Java code, on the other hand, does not need to make any changes because the bytecode produced by the Java compiler executes within the periphery of the JVM. As a result, it is the responsibility of the JVM to align with the underlying platform by reinterpreting the bytecode generated by the Java compiler. This means that, although the product of Java compiler may be platform independent, JVM is platform specific. The same JVM that is installed and works for one architecture may not work in other machine, unless, of course, two machines have the same architectural background.
Figure 1: The JVM structure
What Are JRE and JDK, with Respect to JVM?
To run a Java program, we need JVM because it is the environment on which a bytecode executes. Oracle provides two products: JDK (Java Development Kit) and JRE (Java Runtime Environment). JRE is the basic software that we install to run a Java program. It is an implementation of the JVM, along with Java class libraries and other component parts that provide all the means to run a Java program. So, if we want to run a class file or bytecode, JRE is simply enough. JDK (Java Development Kit), on the other hand, is a superset of JRE. It contains everything JRE offers, including tools to create the class files such as a Java compiler, debuggers, and many other tools related to developing a Java program. So, if we want to create a class file (compile Java source code), what we need is JDK. Here is a screen capture from the Java API Documentation. Note the components that form JDK, JRE, and the core Java SE API library; this gives a fair idea what the contents of JRE and JDK could be.
Figure 2: From the Java API documentation
Java provides Java Virtual Machine Specification, to get a complete idea about the working principles behind JVM. One can take the idea from there and can built one’s own JVM; it’s not an easy task, though. There are many JVMs available in the market. Some are free; some come with a commercial license.
Executing a Java Program in JVM
Each Java program that executes on the Java Runtime Environment creates an instance of JVM, within which it runs. The compiled Java classes are loaded into the environment along with other dependent classes on demand. This is done with the help of a module called Class loader.
Figure 3: The Class loader module and its function
The Class loader does this job in three phases.
Firstly, it loads the program classes, along with standard Java classes that are bundled with JDK in the form of bytecode. The standard classes form the core API library of Java. The bootstrap begins by locating the core API libraries classes typically situated in jre/lib.
Secondly, the extension mechanism locates the external classes, such as new (optional) packages that are added to Java for development and execution purposes. The extension classes are usually located in jre/lib/ext. Sometimes, the extension classes are situated in other directories defined by the java.ext.dirs system property. The packages are in the form of JAR or ZIP extensions.
Thirdly, if the class is not found in the standard Java class or extension classes, it searches the CLASSPATH, which contains a list of locations in which the classes are stored. The system property java.class.path maps to the CLASSPATH environment variable.
The archive file formats, such as JAR or ZIP, are individual files that contain directories of other files, typically in a compressed format. For example, the standard classes used in a program are contained in the archive files rt.jar, which is installed with JDK.
Once the files are located and loaded, the class loader performs various functions, such as verification according to the JVM constraints, memory allocation, and initializes class variables with default values before invoking the constructor to set define variables of the element.
When the loading process finishes, the bytecode instructions are passed to the execution engine. JVM then interacts with the underlying OS with the help of native code that is bound to a particular JVM implementation of a specific platform. Note that actual implementation varies slightly according to the platform.
The heap in the data memory area is used for dynamic and temporary memory allocation. The classes and array objects are created in this area. The garbage collector reclaims the memory when the objects go out of scope.
The Java stack, called the stack frame, is used to store data in the local variable; partial results during different stages of method invocation. Each method invocation creates a stack frame.
The method area is basically a shared storage among JVM threads.
Registers are the emulated registers of the underlying machine and are primarily used for executing bytecode instruction. The PC register or the program counter is the primary register used for holding the address of the currently executing instruction.
Functions of JVM in a Nutshell
The function of JVM thus can be summed up as follows:
- Loading: The process of loading classes with the help of class loader.
- Linking: Linking the classes to submit to the JVM runtime for execution.
- Initializing: Memory allocation and setting up values by invoking class initialization methods.
Conclusion
The greatest advantage of using a virtual machine running programming languages is that they are platform independent. The productivity resulting from such languages outweighs their performance trade-off in comparison to highly efficient languages such as C/C++. This article just gives a glimpse of JVM, and perhaps enough to begin understanding how JVM actually works.