Introduction to JIT Compilation

2012-10-04

The Java HotSpot VM (which Oracle acquired after purchasing Sun Microsystems) forms the basis for both the Java Virtual Machine (JVM) and OpenJDK (an open-source project). Like all Java virtual machines, the Java HotSpot VM provides the necessary environment for executing bytecode. In practice, it is responsible for three main functions:

bytecode interpretation
class loading and type checking
memory management

This article focuses on bytecode interpretation, specifically the optimizations performed by the virtual machine.

JIT Compilation

Besides directly interpreting bytecode, the Java HotSpot VM can also compile bytecode (individual methods in their entirety) into machine instructions to speed up execution.

If you pass the -XX:+PrintCompilation parameter to the virtual machine, you can see how the methods were compiled. This compilation occurs at runtime after the method has already been executed several times. Waiting for the actual use of the method allows the Java HotSpot VM to make a more accurate decision on how to optimize the code by compiling it.

If you are curious about the performance gain from JIT, you can disable it using the -Djava.compiler=none parameter and then observe how your test results change.

The Java HotSpot VM can operate in two independent modes: server or client. The specific mode is chosen by specifying the corresponding parameter -server or -client when launching the JVM (it must be the first parameter on the command line). Depending on the situation, one mode may be preferable over the other. This article will use the server mode.

The main difference between these two modes is that the server mode performs more aggressive optimizations based on assumptions that may not always hold true. For optimization, it always checks whether the corresponding assumption about the optimization is correct. If for some reason the assumption is invalid, the Java HotSpot VM rolls back the optimization and reverts the method to bytecode interpretation mode. This behavior means that the Java HotSpot VM will never perform an incorrect optimization.

By default, in server mode, the Java HotSpot VM will execute a method 10,000 times in interpretation mode before compiling it. You can adjust this value by setting the CompileThreshold parameter. For example, using -XX:CompileThreshold=5000 will cause the Java HotSpot VM to execute the method 5,000 times before compiling it.

It may be tempting to lower the compilation threshold to a very small value. However, this can lead to decreased performance, as time will be spent compiling methods that do not run often enough to cover the overhead of their compilation.

The Java HotSpot VM achieves maximum efficiency when it can gather enough statistics to make a reasonable decision on what to compile. If you reduce the compilation threshold, the Java HotSpot VM may spend an enormous amount of time compiling methods that are not executed frequently. Some optimizations are only performed when sufficient statistics have been gathered. Thus, the code may not be as optimal as it could be.

On the other hand, many developers want to achieve better performance for critical methods (by compiling them) as soon as possible. One standard solution to this problem is to warm up (e.g., by sending test traffic to the system) after starting the process, which allows sufficient execution to trigger compilation.

There are numerous parameters in the Java HotSpot VM that increase the amount of information output about JIT. The most commonly used is PrintCompilation (which we have already seen), but there are several others.

We will use PrintCompilation to observe the effects of method compilation in the Java HotSpot VM during execution. But first, a few words about the System.nanoTime() method for measuring time.

Timers

In Java, we have access to two timers: currentTimeMillis() and nanoTime(). The former is quite close to the time we observe in the physical world. Its resolution is sufficient for most purposes, but not for low-latency applications.

The nanosecond timer is an alternative with higher resolution. This timer measures time in incredibly short intervals. One nanosecond is the time it takes for light to travel 20 centimeters in a fiber optic cable. In contrast, it takes 27.5 ms for light to travel the distance from London to New York via a fiber optic cable.

Due to the very high resolution of the nanosecond timer, it should be handled with caution.

For example, currentTimeMillis() is usually synchronized well enough between machines and can be used to measure network delays. But nanoTime() does not have this property.

Method Inlining

One of the key optimizations of JIT compilation (but not javac) is method inlining: copying the body of a method into the method that called it and eliminating

the call. This functionality is very important because the cost of calling a simple method can be greater compared to the work it performs.

The JIT compiler can perform progressive inlining, that is, start by inlining simple methods and then move on to larger and larger blocks of code until other optimizations become possible.

Consider the following code, which compares the performance of different ways to access fields:

direct access to a class's public field (DFACaller),
through getter and setter (GetSetCaller).

import java.util.concurrent.Callable;
import java.lang.management.ManagementFactory;

public class Main {
  private static double timeTestRun(String desc, int runs,
      Callable<Double> callable) throws Exception {
    long start = System.nanoTime();
    callable.call();
    long time = System.nanoTime() - start;
    return (double) time / runs;
  }

  // time since startup
  private static long uptime() {
    return ManagementFactory.getRuntimeMXBean().getUptime()
      + 15; // fictitious factor
  }

  public static void main(String... args) throws Exception {
    int iterations = 0;

    for (int i : new int[]{ 100, 1000, 5000, 9000, 10000,
                11000, 13000, 20000, 100000} ) {
      final int runs = i - iterations;
      iterations += runs;

      // NOTE: The sum of values is returned as a double to
      // prevent aggressive JIT compilation (loop elimination)

      Callable<Double> directCall = new DFACaller(runs);
      Callable<Double> viaGetSet = new GetSetCaller(runs);

      double time1 = timeTestRun("public fields", runs, directCall);
      double time2 = timeTestRun("get/set fields", runs, viaGetSet);

      System.out.printf("%7d %,7d\t\tfield access=%.1f ns, get/set=%.1f ns%n",
        uptime(), iterations, time1, time2);

      // add delay for better program output
      Thread.sleep(100);
    }
  }
}

import java.util.concurrent.Callable;

public class DFACaller implements Callable<Double> {
  private final int runs;

  public DFACaller(int runs) {
    this.runs = runs;
  }

  @Override
  public Double call() {
    DirectFieldAccess direct = new DirectFieldAccess();
    double sum = 0;
    for (int i = 0; i < runs; i++) {
      direct.one++;
      sum += direct.one;
    }
    return sum;
  }
}

class DirectFieldAccess {
  int one;
}

import java.util.concurrent.Callable;

public class GetSetCaller implements Callable<Double> {
  private final int runs;

  public GetSetCaller(int runs) {
    this.runs = runs;
  }

  @Override
  public Double call() {
    ViaGetSet getSet = new ViaGetSet();
    double sum = 0;
    for (int i = 0; i < runs; i++) {
      getSet.setOne(getSet.getOne() + 1);
      sum += getSet.getOne();
    }
    return sum;
  }
}

class ViaGetSet {
  private int one;

  public int getOne() {
    return one;
  }

  public void setOne(int one) {
    this.one = one;
  }
}

JVM Merger

Oracle engineers are working on merging the Java HotSpot VM and Oracle JRockit into a single solution that will be endowed with the best features of each virtual machine. The resulting virtual machine is planned to be included in the open-source project - OpenJDK. Here are the key points of this merger:

Oracle JRockit and HotSpot will be merged into one JVM, including the best features of both.
The resulting JVM will be based on the HotSpot code with imported features from Oracle JRockit.
The result will gradually be introduced into OpenJDK.
Some existing solutions (such as Mission Control in Oracle JRockit) will remain proprietary.
Oracle will continue to distribute free binary packages of JDK and JRE that include some elements of closed code.
The JVM merger process will be multi-year.

More detailed information about the JVM merger can be read in the article Oracle's JVM Strategy by Henrik Stahl (Senior Director of Product Management Java Platform Group at Oracle). To learn more about HotSpot, visit the OpenJDK HotSpot page. You can see a full list of JDK improvements in the JEP catalog. To follow the development of JVM, you can subscribe to the email newsletter [email protected].

Getters and setters are the first candidates for inlining. These are simple methods that will be much more "expensive" if they are not inlined, as the method call is a more costly operation compared to direct field access.

Compile these classes and perform testing:

$ java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)

$ javac Main.java DFACaller.java GetSetCaller.java

$ java -cp . -XX:+PrintCompilation Main

on my machine (2.8 GHz Intel Core i7, MacOS X 10.7), the output was:

     57    1             java.lang.String::hashCode (55 bytes)
     62     100		field access=3430.0 ns, get/set=3330.0 ns
    156   1,000		field access=140.0 ns, get/set=568.9 ns
    261   5,000		field access=67.3 ns, get/set=481.3 ns
    284    2             ViaGetSet::getOne (5 bytes)
    364   9,000		field access=47.3 ns, get/set=201.5 ns
    488    3             ViaGetSet::setOne (6 bytes)
    493  10,000		field access=109.0 ns, get/set=403.0 ns
    591    4             DFACaller::call (51 bytes)
    591    5             GetSetCaller::call (51 bytes)
    569  11,000		field access=180.0 ns, get/set=346.0 ns
    671  13,000		field access=30.0 ns, get/set=6.0 ns
    772  20,000		field access=9.7 ns, get/set=7.1 ns
    875 100,000		field access=1.7 ns, get/set=1.7 ns

What does all this mean? The numbers in the first column show the time in milliseconds since the program started. The second column displays the method ID (for compiled methods) or the number of iterations performed in the test.

Notice that the method hashCode of the String class was not directly used in the test but was still compiled as it was used by the platform itself.

In the 2nd line, we can see that both ways of accessing the field are quite slow because it was necessary to load the corresponding classes during the first run. The next line shows that the test was significantly faster although no compilation had yet occurred.

Also, note the following:

In tests on 1,000 and 5,000 iterations, direct field access is faster than through get/set calls, as they had not yet been inlined or otherwise optimized. Even without this, both methods work quite fast.
At 9,000 iterations, the getter was optimized (it is called twice per iteration), which gives a slight performance improvement.
At 10,000 iterations, the setter was optimized. Additional time (spent on performing the optimization) led to an increase in the total test time (403 ns instead of 201.5 ns).
Finally, the call() methods of the DFACaller and GetSetCaller classes were optimized:
- the getter and setter were not just optimized but also inlined into GetSetCaller.
- on the next iteration, it can be seen that the test execution time is still not optimal.
After 13,000 iterations, the performance of each method practically equalized. We achieved performance in a steady state.

It is important to note that in a steady state, direct field access or through methods get/set perform equally because the methods were inlined into the methods of the GetSetCaller class. Thus, the code in the GetSetCaller class performs the same actions as the code in the DFACaller class.

JIT compilation occurs in the background exactly when a certain optimization becomes possible for execution (changing from machine to machine and rarely from run to run).

Conclusion

This article has only scratched the surface of JIT compilation in the Java HotSpot VM. In particular, important aspects of writing good tests and how to use statistics to ensure that the dynamic nature of the platform does not deceive us were not covered.

The tests used here are quite simple and are unlikely to be suitable for real measurements. The second part of the article will show how to set up more realistic tests and examine in detail the code produced by the JIT compiler.