Java and JMH benchmark library: how to setup a project in NetBeans


I’m very excited about this my first post regarding one of my favorite activities: programming java! Here, I’ll explain how to build and run a standalone project in NetBeans using a very powerful code tool library called JMH, that allows you to perform benchmarking analysis at nano/micro/milli/macro scale and provides a coherent and reliable framework to write benchmark code avoiding common pitfalls (i.e. unwanted JVM optimizations). Let’s start with a very simple example!

Note: some of the explanations reported in this article are extrapolated from the excellent examples found at JMH’s official site.

Setup the project in NetBeans

Open your NetBeans IDE and create a new project called Benchmark from the menu File > New Project > Java Application. Choose a Project Location of your convenience and do not create packages or classes at this time. Download from this site two libraries named jmh-core-x.y.z and jmh-generator-annprocess-x.y.z, then put them on the classpath of a library tag called jmh-x.y.z created using the panel Tools > Libraries > Ant Library Manager. Repeat this task for the Commons Math library as well, which can be obtained from this site. Finally, link these libraries to your Benchmark project, ending up with this configuration:

JMH libraries

JMH libraries

Create a benchmark class

In our particular configuration, we are enabling the annotation processor to generate the synthetic benchmark code during compilation, hence we will use annotations in our code in order to configure the surrounding JMH library infrastructure.

Here’s my benchmark class called BenchmarkLocalTime, the purpose of which will be clarified shortly:

package benchmark;

import java.time.LocalTime;
import java.util.concurrent.TimeUnit;
import data.Parser;
import java.nio.CharBuffer;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

/**
 * @author Luca Merello
 */
@State(Scope.Benchmark)
public class BenchmarkLocalTime
{
    /**************************************************************************/
    /* PARAM                                                                  */
    /**************************************************************************/

    @Param({"11:22:33"})
    public String arg;

    /**************************************************************************/
    /* STATE : PARSER                                                         */
    /**************************************************************************/

    @State(Scope.Benchmark)
    public static class BenchmarkParser
    {
        Parser instance;

        @Setup(Level.Trial)
        public void initialize()
        {
            instance = Parser.getInstance();
        }

        @TearDown(Level.Trial)
        public void shutdown()
        {
            // Nothing to do
        }
    }

    /**************************************************************************/
    /* STATE : CHAR BUFFER                                                    */
    /**************************************************************************/

    protected CharBuffer data;

    @Setup(Level.Trial)
    public void initializeTrial()
    {
        data = CharBuffer.wrap(arg.toCharArray());
    }

    @Setup(Level.Invocation)
    public void initializeInvocation()
    {
        data.position(0);
    }

    @TearDown(Level.Invocation)
    public void shutdownInvocation() { }

    @TearDown(Level.Trial)
    public void shutdownTrial() { }

    /**************************************************************************/
    /* BENCHMARK                                                              */
    /**************************************************************************/

    @Benchmark
    @BenchmarkMode({/*Mode.Throughput,*/ Mode.AverageTime/*, Mode.SampleTime*/})
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public LocalTime parseLocalTimeSlow(BenchmarkParser parser)
    {
        return parser.instance.parseLocalTimeSlow(data);
    }
    @Benchmark
    @BenchmarkMode({/*Mode.Throughput,*/ Mode.AverageTime/*, Mode.SampleTime*/})
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public LocalTime parseLocalTimeFast(BenchmarkParser parser)
    {
        return parser.instance.parseLocalTimeFast(data);
    }
}

Class methods must be annotated with @Benchmark, so that JMH produces the generated code to run this particular benchmark interpreting the method’s content as the piece of code we want to measure. Note that method names are unimportant and there can be multiple benchmark methods within the same class. Furthermore, if your code potentially throws an exception, you need only to declare it to be thrown without worrying about how it is treated afterwards.

Pay attention to the fact that benchmark methods must return the result of the computation, forcing the JMH to avoid a dead-code elimination as an effect of compiler smart optimization. Another dangerous situation for benchmarking is the constant-folding optimization, where JVM realizes that the result of the computation is always the same no matter what. This can be prevented by always reading the inputs from the state, so as to compute the result based on a data source that is not trivially predictable. As you can see in the code above, both these rules have been followed.

With JMH you can measure benchmark methods in lots of modes and for each you can select the units of measurement for time or leave it as default. For example, some benchmark types are:

  • Throughput: measures the raw throughput by continuously calling the benchmark method in a time-bound iteration and counting how many times the method is executed;
  • AverageTime: measures the average execution time (it’s the reciprocal of throughput!);
  • SampleTime: samples the execution time gathering the execution timings on their own, which allows us to infer the distributions, percentiles, etc.

In my class BenchmarkLocalTime, we will measure the average execution time of two different methods, parseLocalTimeSlow and parseLocalTimeFast, which translate a String into a LocalTime object in two different ways.

As you can see, I need to store a state while the benchmark is running, represented by two objects: a Parser and a CharBuffer. These objects will be instantiated on demand and reused throughout the trial, since when a benchmark methods is called and references a state the JMH will inject the appropriate value. States are defined using the @State annotation and their thread’s scope can be at the whole benchmark level (all threads share the same instance) or at the single thread level (each thread has it’s own copy). In any case, remember that a specific state will always be instantiated by one of those benchmark threads which will then have the access to that state.

In my example, both states are using the Scope.Benchmark annotation since my test will be executed as a single thread. As far it concern Parser, I’ve created a container class named BenchmarkParser with a @State annotation, while in the case of CharBuffer I’ve marked the benchmark class itself to be a single state object since this allows me to reference it’s own fields as usual.

Besides, notice that in my code example I’ve used also some fixture methods annotated like @Setup and @Teardown in order to initialize and reset state objects values during the lifetime of the benchmark. Since time spent in fixture methods does not count into the performance metrics, you can prepare your cache by doing some time expensive data loading or resource cleanup from slow devices (e.g. hard drives, databases, etc). Remember that fixture methods have three different levels to control when they are about to run:

  • Trial: before or after the entire benchmark run (the sequence of iterations);
  • Iteration: before or after the benchmark iteration (the sequence of invocations);
  • Invocation: before or after the benchmark method invocation.

In BenchmarkLocalTime class, I’ve created an instance of the Parser and initialized my char buffer (wrapping the string argument parameter) at the beginning of the trial, so that they are kept the same across all iterations. Notice that my buffer initialization involves the @Param annotation, that is used to parameterize a sequence of several tests in which it’s only the configuration of input parameters to vary, not the benchmark algorithms. Finally, my per-invocation work consists only in resetting the buffer’s position to its first character.

Create a parser class with two different algorithms

In the following Parser class I will create two parsing algorithms that use different strategies in order to convert a String representing a time in a standardized ISO format into a LocalTime object. The first method, named parseLocalTimeSlow, uses an embedded public static method defined in the LocalTime object that accept an immutable DateTimeFormatter for interpreting the time specified by the input string. This is a common way to create a LocalTime starting from a text input, as explained in the jdk reference documentation (see https://docs.oracle.com/javase/tutorial/datetime/iso/format.html). Instead, the second method, named parseLocalTimeFast, uses a more low-level approach, by parsing and converting each couple of chars standing for hours, minutes and seconds in integer values (skipping the colon mark) and then using a specific LocalTime constructor to build its instance. Obviously the latter is more verbose than the former and also more error-prone, but in this case I want to find out which one is faster, regardless any other characteristic.

Hence, the code called by the previously defined @Benchmark annotated methods is the following and has to be saved in a package called data:

package data;

import java.nio.CharBuffer;
import java.time.LocalTime;
import java.time.format.DateTimeFormatter;

/**
 * @author Luca Merello
 */
public final class Parser
{
    private static final char CHAR_ZERO = '0';

    /**
     * Initialization on Demand Holder idiom.
     */
    private static final class ParserHolder
    {
        private static final Parser INSTANCE = new Parser();

        private ParserHolder() { }
    }

    /**
     * Concurrent lazy-loaded singleton.
     * @return
     */
    public static final Parser getInstance()
    {
        return ParserHolder.INSTANCE;
    }

    /**
     * Private constructor (use singleton instead).
     */
    private Parser() { }

    /**
     * Get LocalTime from string formatted as ISO_LOCAL_TIME ('HH:MM:SS').
     *
     * @param _time
     * @return
     */
    public final LocalTime parseLocalTimeSlow(final CharBuffer _buffer)
    {
        return LocalTime.parse(_buffer, DateTimeFormatter.ISO_LOCAL_TIME);
    }
    public final LocalTime parseLocalTimeFast(final CharBuffer _buffer)
    {
        final int hh = parseInt(_buffer, 2);
        skipChars(_buffer, 1); // skip ':'
        final int mm = parseInt(_buffer, 2);
        skipChars(_buffer, 1); // skip ':'
        final int ss = parseInt(_buffer, 2);

        return LocalTime.of(hh, mm, ss);
    }

    private static int parseInt(final CharBuffer _buffer, final int _length)
    {
        int number = 0;

        for( int n = 0; n < _length; n++ )
        {
            final int digit = (int) _buffer.get() - (int) CHAR_ZERO;
            if( digit < 0 || digit > 9 )
            {
                throw new NumberFormatException("Cannot parse number of length <" + _length + "> from char buffer, since found a non-numeric value!");
            }

            number *= 10;
            number += digit; //minus the ASCII code of '0' to get the value of the charAt(i++)
        }

        return number;
    }
    private static void skipChars(final CharBuffer _buffer, final int _charsToSkip)
    {
        _buffer.position(_buffer.position() + _charsToSkip);
    }
}

Create benchmark launcher

At last, we need to create a launcher class for my benchmarks named BenchmarkLauncher.java, saving it into the previously created benchmark package:

package benchmark;

import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.options.TimeValue;

/**
 * @author Luca Merello
 */
public final class BenchmarkLauncher
{
    /**
     * Private constructor.
     */
    private BenchmarkLauncher() {}

    /**
     * Main method to start micro-benchmark tests.
     *
     * @param _args
     * @throws RunnerException
     */
    public static void main(final String[] _args)
        throws RunnerException
    {
        Options opt = new OptionsBuilder()
                         .include(".*" + "Benchmark" + ".*")
                         .warmupTime(TimeValue.seconds(1))
                         .warmupIterations(5)
                         .measurementTime(TimeValue.seconds(1))
                         .measurementIterations(5)
                         .threads(1)
                         .forks(1)
                         .shouldFailOnError(true)
                         .shouldDoGC(true)
                         .jvmArgs("-server")
                         .build();

        new Runner(opt).run();
    }
}

The meaning of the above options is the following:

  • perform 5 warmup iterations to allow the JIT optimizations to be fully accomplished, each lasts one second;
  • then iterates 5 times on method calls measuring performance metrics, each lasts one second;
  • always fork the tests, avoiding mixing JVM’s profile-guided optimizations of different tests together;
  • if you throw an exception from the method body, the run ends abruptly preventing JMH for proceeding further;
  • run garbage collector between tests, cleaning memory and minimizing any interruption occurrence during test;
  • use server JVM profile.

If everything has been done as expected, your built project tree will look like this:

JMH project classes

JMH project classes

Run the benchmark

Now it’s time to gather some measurements in order to declare which one is the fastest algorithm. The JMH console output provides detailed results with metrics and confidence intervals for each benchmarked method, as well as a useful recap of the configuration and an extensive information about each iteration:

# VM invoker: C:\Program Files\Java\jdk1.8.0_20\jre\bin\java.exe
# VM invoker: C:\Program Files\Java\jdk1.8.0_20\jre\bin\java.exe
# VM options: -server
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: benchmark.BenchmarkLocalTime.parseLocalTimeFast
# Parameters: (arg = 11:22:33)

# Run progress: 0,00% complete, ETA 00:00:20
# Fork: 1 of 1
# Warmup Iteration   1: 1,378 us/op
# Warmup Iteration   2: 1,314 us/op
# Warmup Iteration   3: 1,314 us/op
# Warmup Iteration   4: 1,323 us/op
# Warmup Iteration   5: 1,311 us/op
Iteration   1: 1,325 us/op
Iteration   2: 1,314 us/op
Iteration   3: 1,310 us/op
Iteration   4: 1,320 us/op
Iteration   5: 1,317 us/op


Result: 1,317 ±(99.9%) 0,021 us/op [Average]
  Statistics: (min, avg, max) = (1,310, 1,317, 1,325), stdev = 0,006
  Confidence interval (99.9%): [1,295, 1,338]


# VM invoker: C:\Program Files\Java\jdk1.8.0_20\jre\bin\java.exe
# VM options: -server
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: benchmark.BenchmarkLocalTime.parseLocalTimeSlow
# Parameters: (arg = 11:22:33)

# Run progress: 50,00% complete, ETA 00:00:14
# Fork: 1 of 1
# Warmup Iteration   1: 4,846 us/op
# Warmup Iteration   2: 2,448 us/op
# Warmup Iteration   3: 2,432 us/op
# Warmup Iteration   4: 2,403 us/op
# Warmup Iteration   5: 2,398 us/op
Iteration   1: 2,404 us/op
Iteration   2: 2,419 us/op
Iteration   3: 2,421 us/op
Iteration   4: 2,447 us/op
Iteration   5: 2,414 us/op


Result: 2,421 ±(99.9%) 0,061 us/op [Average]
  Statistics: (min, avg, max) = (2,404, 2,421, 2,447), stdev = 0,016
  Confidence interval (99.9%): [2,359, 2,482]


# Run complete. Total time: 00:00:29

Benchmark                                     (arg)  Mode  Samples  Score   Error  Units
b.BenchmarkLocalTime.parseLocalTimeFast    11:22:33  avgt        5  1,317 ± 0,021  us/op
b.BenchmarkLocalTime.parseLocalTimeSlow    11:22:33  avgt        5  2,421 ± 0,061  us/op

Obviously, as the name of the method suggests, the fast one is really faster and it cuts the average time by 45%! But why? The reason is quite straightforward, the LocalTime parse method is intended for general use and it performs several broad-coverage syntax checks that my elementary piece of code does not!

Well, this post used a very simple example to show (and possibly explain!) only a small amount of the features that the JMH library offers … now it’s up to you to continue to explore the potential of this wonderful tool, enjoy! 😀

Advertisements