I'm confused by some of the numbers I've seen for JRuby versus MRI in the benchmark results.
Your numbers show MRI 2.3 being the fastest in non-optimized mode with 23FPS. JRuby 1.7.24 is the fastest production-usable implementation at 19FPS. JRuby 9k (9.0.5.0) has lower perf than 1.7.24.
My own numbers confirm some parts of this and not others.
Invokedynamic not used
First, JRuby (stock) versus Ruby 2.3.0:
jruby 1.7.24 (1.9.3p551) 2016-01-20 bd68d85 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 +jit [darwin-x86_64]
fps: 13.333334275233959
checksum: 59662
ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-darwin14]
fps: 16.10170636312445
checksum: 59662
So JRuby is a bit slower, matching your results. But this is without invokedynamic, normally turned off for JRuby due to longer startup/warmup time.
jruby 1.7.24 (1.9.3p551) 2016-01-20 bd68d85 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 +indy +jit [darwin-x86_64]
fps: 16.100178451396733
checksum: 59662
This puts JRuby 1.7 at about the speed of MRI 2.3.0 on the normal version of this code.
JRuby 9k
Your results show a perf degradation from 1.7.24 to 9.0.5, which we have confirmed and made a number of fixes for in JRuby 9.1.
Note we have still not yet made homogeneous case/when O(1).
jruby 9.0.5.0 (2.2.3) 2016-01-26 7bee00d Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +indy +jit [darwin-x86_64]
fps: 12.010045906189871
checksum: 59662
jruby 9.1.0.0-SNAPSHOT (2.3.0) 2016-03-24 d851678 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +indy +jit [darwin-x86_64]
fps: 30.92613156164269
checksum: 59662
This puts JRuby 9.1 nearly 2x faster than MRI 2.3.0 on the normal code.
It's also interesting to note that 9.1 does not appear to require any of the compatibility stubs.
Optimized code
I can confirm that JRuby does not like the optimized code, most likely because very large bodies of code usually do not JIT in JRuby, or if they do JIT to JVM bytecode the JVM itself may not do further optimization on them. However, I thought I'd try it in JRuby anyway.
jruby 9.1.0.0-SNAPSHOT (2.3.0) 2016-03-24 d851678 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +indy +jit [darwin-x86_64]
fps: 1.649363068304025
checksum: 59662
This is a 20x performance reduction by using the same "optimized" code that speeds up MRI by almost 3x.
If I turn on JIT logging (-Xjit.logging) I can see it fails to compile two generated pieces of code for PPU.run
and CPU.run
:
...
2016-03-31T21:49:06.096-05:00: JITCompiler: done jitting: DMC Optcarrot::APU::DMC.sample at /Users/headius/projects/optcarrot/lib/optcarrot/apu.rb:770
2016-03-31T21:49:06.411-05:00: JITCompiler: Could not compile; passes run: []: <anon class> Optcarrot::PPU.run at (generated PPU core):0 because of: "Could not compile org.jruby.internal.runtime.methods.MixedModeIRMethod@6b72f764; instruction count 204813 exceeds threshold of 2000"
2016-03-31T21:49:07.168-05:00: JITCompiler: done jitting: <block> poke_2007_CLOSURE_1.poke_2007_CLOSURE_1 at /Users/headius/projects/optcarrot/lib/optcarrot/ppu.rb:481
2016-03-31T21:49:07.181-05:00: JITCompiler: done jitting: PPU Optcarrot::PPU.update_scroll_address_line at /Users/headius/projects/optcarrot/lib/optcarrot/ppu.rb:297
...
This is producing a method that requires 204813 instructions in our IR. Most of our IR instructions compile to many JVM bytecodes. Combine these facts with the JVM's method size limit of 64k bytes of JVM bytecode...even if we tried to force JRuby to JIT this code, it would be far too large for the JVM to accept it without us breaking it into smaller pieces.
Our interpreter-only mode is only slightly slower than the JIT when running with --opt. For the moment, this optimization does not fit JRuby's model of execution.