Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidekiq is slow on TruffleRuby #1584

Open
eregon opened this issue Feb 16, 2019 · 12 comments
Open

Sidekiq is slow on TruffleRuby #1584

eregon opened this issue Feb 16, 2019 · 12 comments
Assignees

Comments

@eregon
Copy link
Member

eregon commented Feb 16, 2019

From @nateberkopec https://twitter.com/nateberkopec/status/1096880039491133441
Instructions to reproduce at https://gist.github.com/nateberkopec/a612498e9719af6b8775ecdcb8afaa18 (time to process 20 000 empty jobs with Sidekiq).

We should investigate why it's slow.
hiredis does not work yet (#1546).

@nateberkopec Does it matter whether the benchmark is run with the Rails app, or just running Sidekiq after the script created the jobs is enough? That would simplify the setup.

@nateberkopec
Copy link

nateberkopec commented Feb 16, 2019

I don't think it matters whether or not this is run in a Rails app.

EDIT: I don't have the numbers ready to copypaste, but for me, Truffle native completed the benchmark in about a minute, and MRI was about 40 seconds.

@bjfish
Copy link
Contributor

bjfish commented May 6, 2020

I was running this today and here's what I found:
I also saw pauses between batches of jobs, although short ones, for native but not jvm. Using the --vm.XX:+PrintGC I could confirm these were pauses for GC.

I did run with RAILS_ENV=production as run in development will do many checks for file updates.

I saw these values:
MRI 2.6.5
~ 10 seconds

TruffleRuby native
~ 65 seconds
and then with some configuration:
TRUFFLERUBYOPT='--vm.Xmx16G --vm.Xmn4G'
~ 34 seconds
TRUFFLERUBYOPT='--vm.Xmx16G --vm.Xmn4G --experimental-options --engine.Mode=latency'
~ 26 seconds

With TruffleRuby I saw some delay in startup ( ~10-15 seconds of requiring/loading) which is an area I have been working on improving recently.

@gogainda
Copy link
Contributor

latest results are Truffleruby head (23 sec) vs MRI Ruby 2.7 (23 sec)

@eregon
Copy link
Member Author

eregon commented Aug 16, 2021

On par is progress (thanks for rerunning!), but I think TruffleRuby should be faster here, especially if it's allowed to process jobs in parallel.
Seems worth investigating once the new CPUSampler flamegraph support is in.

@eregon
Copy link
Member Author

eregon commented Jan 7, 2022

New numbers from @mperham: https://twitter.com/eregontp/status/1479534410546483201
We should investigate what's slower.
Those results seem to be with toxiproxy.

@mperham
Copy link

mperham commented Jan 7, 2022

Thanks. I appreciate you investigating this.

@eregon
Copy link
Member Author

eregon commented Jan 24, 2022

I tried running the benchmark but quickly run into non-fatal errors which happen regularly.
It also happens on CRuby although less often as CRuby runs the benchmark for less long.
sidekiq/sidekiq#5137

There is also another type of error when running on TruffleRuby:

2022-01-24T12:33:58.904Z pid=138840 tid=fzk ERROR: Error fetching job: Interrupted system call
2022-01-24T12:33:58.905Z pid=138840 tid=fts WARN: Errno::EINTR: Interrupted system call
2022-01-24T12:33:58.905Z pid=138840 tid=ftc WARN: exception.c:61:in `rb_syserr_fail'
exception.c:61:in `rb_syserr_fail'
exception.c:72:in `rb_sys_fail'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/hiredis-0.6.3/ext/hiredis_ext/connection.c:423:in `__get_reply'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/hiredis-0.6.3/ext/hiredis_ext/connection.c:458:in `connection_read'
/home/eregon/.rubies/truffleruby-dev/lib/truffle/truffle/cext_ruby.rb:41:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/hiredis-0.6.3/lib/hiredis/ext/connection.rb:19:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/connection/hiredis.rb:55:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:280:in `block in read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:268:in `io'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:279:in `read'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:131:in `block in call'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:248:in `block (2 levels) in process'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:389:in `ensure_connected'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:238:in `block in process'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:325:in `logging'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:237:in `process'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:131:in `call'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:226:in `block in call_with_timeout'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:300:in `with_socket_timeout'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis/client.rb:225:in `call_with_timeout'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:1224:in `block in _bpop'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:69:in `block in synchronize'
/home/eregon/.rubies/truffleruby-dev/lib/mri/monitor.rb:218:in `mon_synchronize'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:69:in `synchronize'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:1221:in `_bpop'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/redis-4.2.5/lib/redis.rb:1266:in `brpop'
/home/eregon/code/sidekiq/lib/sidekiq/fetch.rb:47:in `block in retrieve_work'
/home/eregon/code/sidekiq/lib/sidekiq.rb:100:in `block in redis'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/connection_pool-2.2.5/lib/connection_pool.rb:63:in `block (2 levels) in with'
<internal:core> core/thread.rb:94:in `handle_interrupt'
<internal:core> core/thread.rb:94:in `handle_interrupt'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/connection_pool-2.2.5/lib/connection_pool.rb:62:in `block in with'
<internal:core> core/thread.rb:94:in `handle_interrupt'
<internal:core> core/thread.rb:94:in `handle_interrupt'
/home/eregon/code/sidekiq/vendor/bundle/truffleruby/3.0.2.9/gems/connection_pool-2.2.5/lib/connection_pool.rb:59:in `with'
/home/eregon/code/sidekiq/lib/sidekiq.rb:97:in `redis'
/home/eregon/code/sidekiq/lib/sidekiq/fetch.rb:47:in `retrieve_work'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:83:in `get_one'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:95:in `fetch'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:77:in `process_one'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:68:in `run'
/home/eregon/code/sidekiq/lib/sidekiq/processor.rb:67:in `run'
/home/eregon/code/sidekiq/lib/sidekiq/util.rb:56:in `watchdog'
/home/eregon/code/sidekiq/lib/sidekiq/util.rb:65:in `block in safe_thread'

This might be a bug in hiredis to not handle EINTR correctly (EINTR should always be retried).
UPDATE: actually it's an issue of rb_thread_fd_select() in TruffleRuby => GR-36556.
That might not happen on CRuby due to Thread.handle_interrupt(Exception => :never), not sure.
And this not only creates errors but also prevents profiling, because profiling needs to know the thread backtraces (like Thread#backtrace) and that needs to interrupt system calls with SIGVTALRM, very similar to what CRuby does for e.g. Ctrl+C.

@eregon eregon self-assigned this Jan 24, 2022
@gogainda
Copy link
Contributor

FYI, in my recent tests not using hiredis improves a bit overall performance. Not sure if using it brings any benefits to Redis on Truffleruby

@gogainda
Copy link
Contributor

gogainda commented Mar 18, 2022

I see different numbers now: 45s vs 21s . Still much slower but difference is not x4 anymore, which is good
UPDATE: I rerun it without hiredis and saw similar results 45s vs 14s, so in case of TruffleRuby it doesn't make much difference

@eregon
Copy link
Member Author

eregon commented Mar 18, 2022

Right, we should look at this again. I was focused on Ruby 3 keyword arguments and other things but that's done now, and the fix has been merged in sidekiq (sidekiq/sidekiq@1efbadd).

Actually on master sidekiqload now doesn't use hiredis, so I think we should start with that, hiredis likely only adds complexity.
For instance there is an issue with interruption of rb_thread_fd_select() in TruffleRuby, only happens with hiredis (GR-36556), and that prevents profiling (obviously we should fix that, but good if it doesn't block this, and it doesn't seem so easy).

@aardvark179
Copy link
Contributor

I've been taking a look at this./ TruffleRUby is faster at the JSON generation required for creating jobs,. but it's slower at IO for a variety of reasons. The JSON generation could likely be made faster by optimising Float#to_s which is most of the time spent on the JSON generation, and I'm looking at using poll instead of select for IO waiting. to reduce the overheads in that part of the code.

@aardvark179
Copy link
Contributor

We also appear to be much slower at parsing the job json, so validating the job by dumping it to json and parsing it back is not working so well on TruffleRuby.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants