Java Virtual Machine (JVM) does a whole lot of work behind the scenes to execute a binary class file submitted to it for execution. It is not that simple, because it seems to be like invoking the JVM with the java command with the class file passed to it as the command line argument. The class file goes through many phases, but here we basically glimpse the initial three stages of execution, called the loading, linking, and the initialization process. Even then, it is a huge and complex topic in its own right. Therefore, here only the key aspect has been described briefly to get an idea of what it is all about.
The loading, linking, and initialization are the initial processes that JVM commences as soon as a byte code, called the class file, is loaded into JVM for execution. Other processes—such as instantiation, garbage collection, and finalization—occur at the middle stages of the lifetime of the class life cycle. And finally, the process of unloading occurs at the end of the life cycle. JVM provides the environment where different processes take their course. It absolutely does not matter what language compiler is used to convert the source code in to a class file as long as it adheres to the standard that JVM understands.
Apart from Java, there are many well-known JVM languages, such as Clojure, Groovy, Scala, Jruby, Jython, and so forth. A program may be written in any of these languages and compiled by the specific language compiler. The compiled target code is created in a manner to run on JVM.
Although JVM imposes certain rules for class file execution, at the low level it is open to modification of how it interacts at the underlying platform and optimizes the performance. This idea makes it an open architecture to welcome vendor-specific tweaking to make it better under certain circumstances.
So, apart from Oracle and OpenJDK implementation of JVM, there are other active implementations available in the market, such as CACAO, jikes RVM, Maxine, JamVM, and the like.
The loading, linking, and initialization processes are initiated at the beginning stages of importing class files into JVM. Though there are a lot of intricacies involved, if we dive into this topic leaving only other JVM processes, the idea gets too dense for lucid comprehension. This is because one idea is interlinked with the other; for example, if an error crops up in the loading phase, reporting waits until the linker comes into play. So, the idea of loading, linking, and initialization may seem discrete and yet they overlap on many occasions.
The Journey of the Class File
The life cycle journey of Java class file: The Java compiler creates a class file as the outcome of a file given to it as a source code. The class file, although it is binary data, is far from ready to execute in a machine without the Java Virtual Machine (JVM). This means that the class file is totally dependent on the JVM environment to execute. JVM provides the runtime environment and understands the binary instruction represented in the class file. It is the JVM that reciprocates with the underlying platform to the execute class file instructions. The middleman JVM not only provides the playground for the class file but also acts as an intermediary for the exchange of services and resources. Therefore, if we break down the processes of JVM that it undertakes to successfully execute a class file, they are many. But, to begin with, there are three processes that JVM follows at the initial stages of importing a class file into its domain. These three processes are called loading, linking, and initialization.
The Process of Loading
As per the Java 8 Virtual Machine Specification, it is the process of finding the binary representation of a class or interface type with a particular name and creating a class or interface from that binary representation.
JVM provides two types of class loaders. One is called bootstrap class loader and another is the user-defined class loader. The bootstrap class loader is rigidly defined in the JVM and loads class files according to the specification. The user-defined class loader is open for vendor-specific implementation and can custom load classes via the java.lang.Class instance. Observe that (in the Java API documentation), this class has no public constructor. As a result, the Class objects are automatically created by the JVM and one can get all the information about the class’s internal data structure via the member functions of this class. Once a class is loaded, JVM parses it according to the internal data structure. Typically, a class loader caches the binary representation of the type either at load time or well in advance or in relation to a group of classes. If any problem is encountered, even at the initial stages of the load time, say, due to a malformed class, it does not immediately report the problem; instead, it waits until the class is actively referred by the program and reports the linker error. If no such reference is made in the entire course of the program, the error may persist but no report will be made.
Therefore, in a nutshell, the loading process basically performs these three functions:
- Create a binary stream of data from the class file
- Parse the binary data according to the internal data structure
- Create an instance of java.lang.Class
The Process of Linking
As per Java 8 Virtual Machine Specification, it is the process of taking a class or interface and combining it into the run-time state of the JVM so that it can be executed.
The linking begins with the process of verification of the class, ensuring that it adheres to the semantics of the language and does not disrupt the integrity of JVM. The JVM specification, though, states the process of verification yet offers flexibility for vendor-specific JVM implementers to decide when the linking activities should take place or how to verify the types.
There is a list of exceptions specified by JVM to throw under specific circumstances. In this regard, it is worth mentioning that there are tid-bits of checks and verification occur right from the beginning where binary data is parsed into the internal data structure and the checks in this process ensure that the operation does not crash. Also, checking is done to ensure that the structure of the binary data aligns with the format it expects. The loader also checks that a class is a subclass of java.lang.Object with the only exception being the Object class itself. This often requires recursive loading of the superclass hierarchy. In this manner, numerous verifications take place at multiple stages, but typically it is considered that the official verification begins with the linking.
Once the verification is done, JVM allocates memory for the class variables and initializes them to default values according to the type of the variable. The actual initialization (with user-defined initialization values), however, does not occur until the next initialization phase. This process is called Preparation.
Finally, in the optional Resolution phase, JVM locates classes, interfaces, fields, and methods referenced in the constant pool (symbol table) and determines the concrete values from their symbolic reference. The Java symbolic reference resolution, again, is open to vendor-specific implementation. It may decide to resolve symbolic references in a class or interface when it is used, or resolve them during the verification process. The verification, in a nutshell, checks that the binary representation of a class is structurally correct. And, it might ensure that it may have to load additional classes (maybe) without any need to verify those classes (if those classes are part of the Java API library).
Therefore, in a nutshell, the linking process involves three functions:
- Resolution (optional)
The Process of Initialization
As per Java 8 Virtual Machine Specification, initialization of a class or interface consists of executing its class or interface initialization method.
After the class or interface is linked through the process of verification, preparing, and optionally resolving, the initialization phase makes the class ready for its first active use. The process starts with initializing the class variables with the value that the program is expected to start off. It is the responsibility of the programmer to decide what the appropriate value for the class variables should be, according to one’s grand plan. Therefore, initialization means that the class variables are initialized via some initialization routine described by the programmer and initialize the class’s direct superclass if it has not been already initialized. The initialization of an interface, however, does not require initialization of its super-interfaces. This is an exception with an interface.
Therefore, to summarize, the initialization process involves the following two functions:
- Initialize class variables with the routine specified by the programmer.
- Initialize its super classes if it is not already initialized.
This is a very brief overview of the loading, linking, and initialization process found in JVM. There are many finer intricacies involved in each of the phases and have been overlooked to keep it simple and concise. This article used the following as the references for the write-up and also suggests that the readers should refer to them if they need of more elaboration and fine detailing.
- Java Virtual Machine Specification (jvms8.pdf)
- Inside the Java Virtual Machine by Bill Venners