-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower than CRuby while parsing files through YARP/Prism gem #3293
Comments
Thank you for the report, we'll investigate. |
Over there i was running the benchmarks manually so I was parsing each file 5 times, to see how big of a time difference there is, hence the 60 seconds |
Here are the results I got, discourse=5d632fd30a50747ec9d1d1f0940428a15df781a2. source "https://rubygems.org"
gem 'parallel'
gem 'prism' require 'parallel'
require 'prism'
require 'benchmark'
puts RUBY_DESCRIPTION
GLOB = "#{Dir.home}/code/discourse/**/*.rb"
def bench
Parallel.each(Dir.glob(GLOB), in_threads: 8) do |file|
Prism.parse_file(file)
end
end
3.times do
puts Benchmark.measure { bench }
end
My CRuby results are close to yours. Do you have 8 physical cores? I guess probably 4 which could explain that last difference. Talking of parallelism, I also ran the benchmark without require 'parallel'
require 'prism'
require 'benchmark'
puts RUBY_DESCRIPTION
GLOB = "#{Dir.home}/code/discourse/**/*.rb"
def bench
Dir.glob(GLOB) do |file|
Prism.parse_file(file)
end
end
3.times do
puts Benchmark.measure { bench }
end and that gives:
We see basically no difference on CRuby, as expected due to the GIL. That leaves us with why this workload is quite a bit slower on TruffleRuby, I'll keep digging, it's probably something related to deserialization in Prism. |
The most obvious thing there is we see 35% of the time is spent in |
Something else we see when pressing
So that's a big bottleneck, as the interpreter is much slower than JITed code, and also it allocates a lot more. And TruffleRuby does not currently optimize I'll try a different approach in Prism and see if that helps. |
Using the simplified single-threaded benchmark (no require 'prism'
require 'benchmark'
puts RUBY_DESCRIPTION
GLOB = "#{Dir.home}/code/discourse/**/*.rb"
def bench
Dir.glob(GLOB) do |file|
Prism.parse_file(file)
end
end
N = Integer(ARGV.first || 3)
N.times do
puts Benchmark.measure { bench }
end and using prism master.
(basically the same as above) And by changing load_node to be a lambda per node type I get:
Almost a 2x speedup! |
I'm curious as to how this helps. Does it help us get around this limitation?:
|
* Otherwise load_node is too big to compile and is forced to run in interpreter: oracle/truffleruby#3293 (comment) * For the benchmark at oracle/truffleruby#3293 (comment) Before: 10.574041 After: 5.592436
I've pushed my changes so far to ruby/prism@main...eregon:yarp:tweaks.
Yes, since the lambdas are in a Array so that's just O(1) Array#[]. About require 'prism'
require 'benchmark'
puts RUBY_DESCRIPTION
GLOB = "#{Dir.home}/code/discourse/**/*.rb"
def bench
Dir.glob(GLOB) do |file|
code = File.binread(file)
Prism::Source.new(code)
end
end
N = Integer(ARGV.first || 3)
N.times do
puts Benchmark.measure { bench }
end But that is much faster and does not take 60% of the ~5.7 seconds/iteration above:
So I wonder if the profiler is misleading there. |
Not sure if this is helps, but if I just create new |
It definitely takes slightly longer, when I use
and if I simply use the require 'parallel'
require 'prism'
require 'benchmark'
puts RUBY_DESCRIPTION
GLOB = "#{Dir.home}/oss/discourse/**/*.rb"
def bench_prism
Parallel.map(Dir.glob(GLOB), in_threads: 4) do |file|
Prism.parse_file(file)
end
end
def bench_prism_thread
threads = []
Dir.glob(GLOB).each do |file|
threads << Thread.new { Thread.current[:tree] = Prism.parse_file(file) }
end
threads.each(&:join)
end
n = Integer(ARGV[0] || 3)
n.times do
puts Benchmark.measure { bench_prism_thread }
end results:
Following is similar to the above but just without
|
I seem to be getting similar times in all 3 versions, when using require 'parallel'
require 'prism'
require 'benchmark'
require 'concurrent'
puts RUBY_DESCRIPTION
GLOB = "#{Dir.home}/oss/discourse/**/*.rb"
def thread_pool
pool = Concurrent::FixedThreadPool.new(4)
Dir.glob(GLOB).each do |file|
pool.post { Prism.parse_file(file) }
end
pool.shutdown
end
n = Integer(ARGV[0] || 3)
n.times do
puts Benchmark.measure { thread_pool }
end
Is this legit or am I exiting before all the execution can complete? |
I think shutdown will not wait for the threads or tasks to be done: https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/ThreadPoolExecutor.html#shutdown-instance_method |
* Otherwise load_node is too big to compile and is forced to run in interpreter: oracle/truffleruby#3293 (comment) * For the benchmark at oracle/truffleruby#3293 (comment) TruffleRuby Native 23.1.0: Before: 10.574041 After: 5.592436 JRuby 9.4.3.0: Before: 7.037780 After: 3.995317 JRuby 9.4.3.0 -Xcompile.invokedynamic=true: Before: 7.047832 After: 2.269294
* Otherwise load_node is too big to compile and is forced to run in interpreter: oracle/truffleruby#3293 (comment) * For the benchmark at oracle/truffleruby#3293 (comment) TruffleRuby Native 23.1.0: Before: 10.574041 After: 5.592436 JRuby 9.4.3.0: Before: 7.037780 After: 3.995317 JRuby 9.4.3.0 -Xcompile.invokedynamic=true: Before: 7.047832 After: 2.269294
* Otherwise load_node is too big to compile and is forced to run in interpreter: oracle/truffleruby#3293 (comment) * For the benchmark at oracle/truffleruby#3293 (comment) TruffleRuby Native 23.1.0: Before: 10.574041 After: 5.592436 JRuby 9.4.3.0: Before: 7.037780 After: 3.995317 JRuby 9.4.3.0 -Xcompile.invokedynamic=true: Before: 7.047832 After: 2.269294 ruby/prism@a592ec346a
From ruby/prism#2402 (comment)
Such a big speedup was unexpected, so I investigated. Changing the code in Prism ffi.rb to avoid using that Ruby String for pm_serialize_parse for the Prism.parse_file case it looks a lot better.
Still 24aae7aa51ce2877adc0718cbe6cedca532f01fe but with the change in ffi.rb:
CRuby 3.3:
|
I checked and this is necessary for the FFI It also needs changes in Prism so that Prism uses :string instead of :pointer for |
Argh that idea sort of falls apart because PRISM_EXPORTED_FUNCTION void pm_serialize_parse(pm_buffer_t *buffer, const uint8_t *source, size_t size, const char *data); So both |
With this change in Prism: diff --git a/lib/prism/ffi.rb b/lib/prism/ffi.rb
index 3418d787..639185f3 100644
--- a/lib/prism/ffi.rb
+++ b/lib/prism/ffi.rb
@@ -19,14 +19,23 @@ module Prism
# Convert a native C type declaration into a symbol that FFI understands.
# For example:
#
- # const char * -> :pointer
+ # const char * -> :string
+ # pm_buffer_t *-> :pointer
# bool -> :bool
# size_t -> :size_t
# void -> :void
#
- def self.resolve_type(type)
+ def self.resolve_type(type, return_type)
type = type.strip
- type.end_with?("*") ? :pointer : type.delete_prefix("const ").to_sym
+ if !return_type && (type == "const char *" || type == "const uint8_t *")
+ # This type avoids extra copies when passing Ruby Strings arguments to native code.
+ # We don't use it for the return type because we need the pointer e.g. for pm_string_source().
+ :string
+ elsif type.end_with?("*")
+ :pointer
+ else
+ type.delete_prefix("const ").to_sym
+ end
end
# Read through the given header file and find the declaration of each of the
@@ -56,10 +65,14 @@ module Prism
# Resolve the type of the argument by dropping the name of the argument
# first if it is present.
- arg_types.map! { |type| resolve_type(type.sub(/\w+$/, "")) }
+ arg_types.map! { |type| resolve_type(type.sub(/\w+$/, ""), false) }
+
+ if name == 'pm_serialize_parse'
+ arg_types = [:pointer, :string, :size_t, :pointer]
+ end
# Attach the function using the FFI library.
- attach_function name, arg_types, resolve_type(return_type)
+ attach_function name, arg_types, resolve_type(return_type, true)
end
# If we didn't find all of the functions, raise an error.
And fixes in TruffleRuby to optimize
(vs ~2.5 before, this is on jvm-ee) |
…f a FFI call * See #3293 (comment) and ffi/ffi#1080 * As that could cause extra conversions to managed later on.
Using the single-thread benchmark from #3293 (comment) and ruby/prism#2426
So it's about the same performance as CRuby 3.3.0 single-threaded. Now with the parallel benchmark from #3293 (comment):
So that's about 3.5x-5x as fast! (with 8 threads) Out of curiosity I also tried with |
…f a FFI call * See #3293 (comment) and ffi/ffi#1080 * As that could cause extra conversions to managed later on.
I've got a file where I'm trying to parallely parse all the Ruby files in the discourse codebase.
I am also using the Prism and Parallel gem to do so. Here's my Gemfile.lock:
I ran this code using CRuby 3.3.0-preview1, truffleruby-23.0.0 (native) truffleruby-23.1.0 (native). Here are the results:
I threw in the GraalVM version for good measure and it seemed to be JIT compiling pretty well but is still slower than CRuby.
TruffleRuby seems considerably slower than CRuby in these benchmarks. Is this a bug?
The text was updated successfully, but these errors were encountered: