Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate issue with Ruby 3.1/3.2 and worker fibers #162

Open
cretz opened this issue Oct 7, 2024 · 5 comments
Open

Investigate issue with Ruby 3.1/3.2 and worker fibers #162

cretz opened this issue Oct 7, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@cretz
Copy link
Member

cretz commented Oct 7, 2024

Describe the bug

Currently fiber-based workers hang in Ruby 3.1/3.2 but work in Ruby 3.3+. Initial investigation is showing that pushing to a Queue from a separate thread may not be waking up a fiber. As a (temporary?) workaround, we are only allowing Fiber-based workers in Ruby 3.3+.

@cretz cretz added the bug Something isn't working label Oct 7, 2024
@cretz cretz mentioned this issue Oct 7, 2024
@bdchauvette
Copy link

You might've already looked into this, but any chance this is related to the introduction of M:N scheduling in 3.3? 🤔

https://bugs.ruby-lang.org/issues/19842

@cretz
Copy link
Member Author

cretz commented Oct 8, 2024

I do suspect it is related to that. Basically what we are finding pre-3.3 is that in some cases when we have one Ruby thread that pushes to a queue, a fiber that is waiting on queue pop is not woken up. Due to how we need to use Rust, we need to be able to wake up fibers from another thread and queues were the logical way to do it. Note, this does not happen all the time. In many cases a queue push from another thread does resume a fiber waiting on queue pop.

Maybe it's something we are just not doing right, or maybe there's some other way to wake up a fiber from another thread. We may end up reworking our callback logic to not use queue from other thread, but instead do a captured_fiber_scheduler.fiber { queue.push } hoping that will fix it.

This is causing our fiber-based tests to not work in < 3.3. We need to do a lot more investigating here, but for now so we could get the activity worker PR out, we created this separate issue.

@cretz
Copy link
Member Author

cretz commented Oct 8, 2024

The other obvious question becomes, how important is pre-3.3 fiber support? 3.1 will be EOL'd in the spring and 3.2 a year after that. And of course threads work across all versions. It just becomes a question of how much work to investigate/solve vs benefit.

@mjameswh
Copy link
Contributor

The other obvious question becomes, how important is pre-3.3 fiber support? ... It just becomes a question of how much work to investigate/solve vs benefit.

Agree, that would be a reasonable compromise.

Only thing: are you sure this issue is purely a "fiber on pre-3.3" thing, and not hiding something bigger that could haunt us later? It might be worth investigating just a little bit more to confirm your initial suspicion before dismissing.

@cretz
Copy link
Member Author

cretz commented Oct 10, 2024

are you sure this issue is purely a "fiber on pre-3.3" thing, and not hiding something bigger that could haunt us later? It might be worth investigating just a little bit more to confirm your initial suspicion before dismissing.

I am not sure because I don't know the exact cause. I agree would like to investigate (or delegate the investigation). I did basic investigation showing literally as I used asdf to switch Ruby versions, a test was not working in 3.1/3.2 and was in 3.3, and tracked it down to a queue.pop in a fiber not waking up to a queue.push from another thread (a Ruby thread, but from Rust code). But I could not replicate with a trivial thread.new + queue.push and fiber.schedule + queue.pop, hence the opening of this ticket for investigating more deeply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants