-
-
Notifications
You must be signed in to change notification settings - Fork 926
Benchmarks
Because JRuby runs on the JVM and has various settings for different optimizations, benchmarking it requires a bit more care than benchmarking non-optimizing implementations of Ruby. This document tries to describe the basics of getting good benchmark numbers with JRuby.
- Always try to use the most recent version of the JVM. Newer versions generally perform better and have newer optimizations.
- JRuby should be benchmarked with invokedynamic enabled, either by passing
-Xcompile.invokedynamic
to JRuby (or viaJRUBY_OPTS
env) or passing-Djruby.compile.invokedynamic
to the JVM. - Run sufficient iterations for the application to warm up and results to level off.
- Try other GC modes on the JVM, such as using the parallel collector with
-J-XX:+UseParallelGC
passed to the JVM (or prefixed with-J
passing to JRuby). The Parallel GC currently provides the best throughput on JRuby.
You can also monitor the JRuby JIT to ensure it is compiling code by passing -Xjit.logging
to JRuby. Methods that do not JIT compile will perform significantly worse. Such compilation failures should be reported to the JRuby team.
Other guides on the web can provide additional recommendations for tuning and benchmarking JVM-based applications.
The JVM has many mechanisms for monitoring the garbage collector. The simplest of these – for HotSpot-based JVMs like the OpenJDK builds maintained by Oracle, Red Hat, Amazon, Twitter, Microsoft, and Azul – is to pass -XX:+PrintGCDetails
to the JVM (prefixed with -J
if passed to JRuby.
Excessive GC may indicate a problem or missing optimization in JRuby, or it may indicate an area of excess allocation in your application.
Note also that the JVM will try to use a large amount of memory to give its GC room to work, so a direct comparison of JRuby's memory footprint without GC tuning will be misleading. You can use JVM flags like -Xmx<size>
to set a smaller maximum heap, but smaller heaps may spend more time in GC.
It is also important to avoid heavy IO (reading/writing to files, sockets, or console) unless you're actually benchmarking IO or it's necessary for the code you are benchmarking. IO skews execution performance tremendously and can produce results that vary based on many system-level factors. IO is also sometimes slower in JRuby, so benchmarks of execution performance will be inaccurate when heavy IO is involved.
Excessive throwing of exceptions will greatly reduce performance on JRuby, due to the high cost of generating a stack trace on the JVM. You can monitor stack trace generation for exceptions and Kernel#caller
by passing -Xlog.backtraces
to JRuby.
Fibers can also hinder performance on pre-Loom JVMs due to JRuby's use of native threads. Excessive fiber creation is usually the problem when a process has hundreds of idle threads, and thread-based fibers are slower to start up and context-switch than fibers based on Loom's virtual threads. Virtual threads will be used automatically on a Loom-based JVM with virtual threading enabled (such as via the --enable-preview
JVM flag).
JRuby is capable of utilizing all CPU cores in your system thanks to the JVM's excellent support for threads. For benchmarks where CRuby would use multiple processes, try to utilize multiple threads in the same JRuby/JVM instance.
The JRuby team is standing by to help diagnose performance problems! If your application does not perform as well on JRuby as you expect, let the team know with a message or issue and we will help you find the problem.