Numeric Computations in Java
Performance
A typical reaction from the numerical computing software developers to Java is that "it is slow." Indeed , this was the case when the first JVM appeared. It worked by completely interpreting the bytecode in the class files and performance was very poor. Technology has changed in the last few years and nearly all JVM that is available today for traditional computing devices such as PCs, workstations, and servers uses just-in-time (JIT) compiler technology. JIT operates as part of the JVM, compiling Java bytecode into native machine code at run-time. As the machine code is generated, it then executes a raw machine speed. JIT performs sophisticated optimisations, such as array-bound checking and stack allocation of objects.
A benchmark test conducted by members of the JGF from the National Institute of Standards and Technology (N.I.S.T.) using the most common kernels found in scientific applications such as: fast-fourier-transform (FFT), successive-over-relaxation (SOR) iterations, monte-carlo quadrature (QUAD), sparse-matrix multiplication, and dense-matrix factorisation (LU) for solution of linear systems. The result of this test shows that the performance of Java varies greatly across computing platforms. The difference is mainly due to different implementations of the JVM, rather than the underlying hardware architecture. The test showed that Java codes performed competitively with optimised C and Fortran. Java clearly outperforms C on the Windows platform, where both the Microsoft and Borland C compilers were used. The group emphasized that the test is not for comparison of the languages, but rather the different implementations of compilers and execution environments that differ from vendor to vendor. The JGF group concluded that Java performance is certainly competitive with C, even though numeric codes in Java give about 50% of the performance of conventional compiled languages.
Obstacles to Numerical Computing with Java
Overrestrictive floating-point semantics
Java currently forbids common optimisations, such as making use of the associativity property of mathematical operators; for example, (a+b)+c may produce a different rounded result than a+(b+c). In comparison to Fortran, compilers routinely make use of the associative property of real numbers for code optimisation. Java also forbids the use of fused-multiply-add (FMA) operations. FMA computes such a quantity as a*x+y as a single floating-point operation. This type of operation is found in many compute-intensive applications, particularly matrix algebra operations.
As an example, most government treasury departments around the world use a well-known economic model called the Leontief-Input-Output, named after the developer of this model who was an academic and Nobel Prize winner in Economics (1973). The Leontief model divides the nation's economy into many sectors such as manufacturing, communication, entertainment, and service industries. Its matrix representation is shown below, and the original model uses a [500 x 500] rows-by-columns matrix to represent model variables; I - identity matrix, C - consumption matrix, D - demand colum-matrix, Y - output column-matrix of amount produced.
Y = C*Y + D Y = (I - C)^{-1}*D : (solution) |
As you can see with a calculation that has hundreds of thousands (or even millions) of economic variables, it is going to take time for a supercomputer to solve the equation. The computer will mainly take two major steps to calculate the answer; first, it will compute the inverse matrix of the bracket term; second, it calculates the matrix multiplication of the inverse-result and D (demand column-matrix). Calculation of the inverse-matrix from the first step is where the processor will spend most of its time. There are other algorithms that can be used; they are a bit faster instead of taking the inverse, such as partitioning of the larger matrix into smaller ones. Matrix partitioning must be addressed in the JSR-83 multiarray package.
With the FMA instruction, only a single rounding occurs for the two arithmetic operations, yielding a more accurate result in less time than would be required for two separate operations. Java's strict language definition does not use FMA and thus sacrifices performance on some platforms.
For Java to have the fastest performance possible, it is necessary to further relax the above restrictions. It is possible to establish this by introducing a fast mode for the execution of floating-point operations in Java.
Inefficient support for complex numbers and alternate systems
Any language that seriously supports scientific and engineering computing should consider in its design the ease and efficiency with which computation with complex numbers can be done. There are some compute-intensive applications that are best handled by complex numbers, such as those in Fluid Dynamics, Opto-Electronics, Electromagnetics, DSP, Acoustics, Electrical Power Systems and Transmission-lines, and so forth. Because complex numbers can only be created in Java as objects, the semantic of assignments ( = ) and equals ( = = ) are different for objects. Complex arithmetic is slower than Java's arithmetic on primitive types because it takes longer to create and manipulate objects.
Objects also incur more storage overhead than primitive data-types. Temporary objects must be created for almost every method call. Because every arithmetic operation is a method call, this leads to a glut of temporary objects which must be frequently dealt with by the garbage collector. In comparison to primitive data-types, they are directly allocated on the stack, leading to very efficient manipulation.
The solution to the above shortfall can be achieved by the introduction of some new features to the Java language. The first feature is operator overloading and the second is lightweight objects. The operator overloading allows one to define, for example, the meaning of (A+B) when A and B are arbitrary objects. Operator overloading is available in other languages such as C++. Java would need the overload of arithmetic operators, comparison operators, and the assignment operator.
Another approach for both efficiency and a convenient notation would be to extend the Java language with complex primitive data-types; this could be done without extending the JVM. This job could be taken care of by the compiler.
Page 3 of 5