Split Prism::Loader#load_node in one lambda per node type #1710

eregon · 2023-10-18T14:57:27Z

Otherwise load_node is too big to compile and is forced to run in interpreter: Slower than CRuby while parsing files through YARP/Prism gem oracle/truffleruby#3293 (comment)
For the benchmark at Slower than CRuby while parsing files through YARP/Prism gem oracle/truffleruby#3293 (comment)
TruffleRuby Native 23.1.0: Before: 10.574041 After: 5.592436
JRuby 9.4.3.0: Before: 7.037780 After: 3.995317
JRuby 9.4.3.0 -Xcompile.invokedynamic=true: Before: 7.047832 After: 2.269294

I did not measure on CRuby since the FFI backend is not default there.
EDIT:
I measured on CRuby 3.2 too and it's a little bit slower but almost the same:
Without YJIT: Before: 5.284203 After: 5.357027
With YJIT: Before: 3.333703 After: 3.516235
I think that's fine, given that on CRuby it's using the C extension for YARP.parse and that's faster (2s, 1.75s with YJIT).

cc @enebo

enebo · 2023-10-18T15:07:25Z

@eregon I half wonder how naming methods for each type with a send would work? Not just because it may be more expensive for us to run blocks but because it would make debugging the gem give a much nicer backtrace when there is a problem.

eregon · 2023-10-18T15:21:24Z

@enebo That would be a megamorphic send (megamorphic in lookup since the method name is one of many, impossible to inline cache that lookup), which is a lot more expensive than a block call with a non-constant block "call target" (which has no lookup).

enebo · 2023-10-18T16:14:41Z

@eregon yeah I am aware of that but block dispatch is quite a bit more expensive than method dispatch so I think that single hash lookup will still be cheaper. I can understand why you may not like it.

eregon · 2023-10-18T16:19:29Z

Feel free to experiment, I'd like to merge this first in any case.
For TruffleRuby I'm pretty sure that send approach will be significantly slower, after all it's an extra hashmap lookup + getting the right symbol from an index.
The cost to call a block is very similar to calling a method on TruffleRuby for the calling part itself, it's just writing a few hidden arguments in the arguments array and calling the CallTarget (with an IndirectCallNode here since the target differs).

enebo · 2023-10-18T16:24:09Z

@eregon yeah definitely land it as it makes a big difference for both our impls.

kddnewton · 2023-10-23T15:27:56Z

I'm seeing about a 10% slow-down on CRuby for this. I'd rather not impose that penalty unless we have to. Could you check the platform that you're on when you're templating out the file? That would allow both implementations to still be quick.

kddnewton

Marking as request changes

eregon · 2023-10-23T18:30:25Z

I'm seeing about a 10% slow-down on CRuby for this. I'd rather not impose that penalty unless we have to.

That's more than I saw in my measurements in the description.

Could you check the platform that you're on when you're templating out the file? That would allow both implementations to still be quick.

Unfortunately that cannot work since the same templates/lib/prism/serialize.rb in a .gem is used on both CRuby and TruffleRuby.

I can however emit both code paths and do if RUBY_ENGINE == checks.

I'm not sure it's worth it given Prism.load is anyway much slower on CRuby than Prism.parse, but it's easy enough, I'll do it.

@enebo One thing that is worth a try is to have an array of Method objects, then the megamorphic lookup would be avoided and it would be Method#call instead of Proc#call. Not sure if that's better.

* Otherwise load_node is too big to compile and is forced to run in interpreter: oracle/truffleruby#3293 (comment) * For the benchmark at oracle/truffleruby#3293 (comment) TruffleRuby Native 23.1.0: Before: 10.574041 After: 5.592436 JRuby 9.4.3.0: Before: 7.037780 After: 3.995317 JRuby 9.4.3.0 -Xcompile.invokedynamic=true: Before: 7.047832 After: 2.269294

eregon · 2023-10-26T14:14:56Z

Rebased and now the exact same code as before is used on CRuby.

eregon requested a review from kddnewton October 18, 2023 14:57

kddnewton requested changes Oct 23, 2023

View reviewed changes

eregon force-pushed the optimize branch from 83c326e to a592ec3 Compare October 26, 2023 14:13

eregon requested a review from kddnewton October 26, 2023 14:15

kddnewton approved these changes Oct 26, 2023

View reviewed changes

kddnewton merged commit a010c3f into ruby:main Oct 26, 2023
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Prism::Loader#load_node in one lambda per node type #1710

Split Prism::Loader#load_node in one lambda per node type #1710

eregon commented Oct 18, 2023 •

edited

Loading

enebo commented Oct 18, 2023

eregon commented Oct 18, 2023 •

edited

Loading

enebo commented Oct 18, 2023

eregon commented Oct 18, 2023

enebo commented Oct 18, 2023

kddnewton commented Oct 23, 2023

kddnewton left a comment

eregon commented Oct 23, 2023

eregon commented Oct 26, 2023

Split Prism::Loader#load_node in one lambda per node type #1710

Split Prism::Loader#load_node in one lambda per node type #1710

Conversation

eregon commented Oct 18, 2023 • edited Loading

enebo commented Oct 18, 2023

eregon commented Oct 18, 2023 • edited Loading

enebo commented Oct 18, 2023

eregon commented Oct 18, 2023

enebo commented Oct 18, 2023

kddnewton commented Oct 23, 2023

kddnewton left a comment

Choose a reason for hiding this comment

eregon commented Oct 23, 2023

eregon commented Oct 26, 2023

eregon commented Oct 18, 2023 •

edited

Loading

eregon commented Oct 18, 2023 •

edited

Loading