Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix can't resume a running fiber #14128

Conversation

ysbaddaden
Copy link
Contributor

The changes in #14098 require that the fiber be associated to a thread once, instead of doing it on every swapcontext, yet enqueueing the fiber didn't do it while associating the fiber to a thread, which led to further enqueues to associate the fiber to any other thread, in which case a thread B was trying to resume a fiber that was still running on thread A (it didn't swap context, yet).

closes #14124

The changes in crystal-lang#14098 require that the fiber be associated to a thread
once, instead of doing it on every swapcontext, yet enqueueing the fiber
didn't do it while associating the fiber to a thread, which led to
further enqueues to associate the fiber to any other thread, in which
case a thread B was trying to resume a fiber that was still running on
thread A (it didn't swap context, yet).

closes crystal-lang#14124
@ysbaddaden
Copy link
Contributor Author

Note: this is the theory. I didn't try it (yet), though I didn't get any bug on the fix/mt-issues branch which was behaving as this patch does.

src/crystal/scheduler.cr Outdated Show resolved Hide resolved
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
@straight-shoota straight-shoota added kind:bug topic:stdlib:concurrency topic:multithreading kind:regression Something that used to correctly work but no longer works labels Dec 21, 2023
@straight-shoota straight-shoota merged commit a155936 into crystal-lang:master Dec 21, 2023
56 checks passed
@ysbaddaden
Copy link
Contributor Author

This has already been merged (I had issues with my linux linking an invalid executable when building the std specs only), but: I can reproduce the crash 100% on master and can't reproduce it anymore with this patch.

CI may be less affected because there are less vCPU available than on our local laptops, so less parallelism to trigger the bug.

@ysbaddaden ysbaddaden deleted the fix/14124/cant-resume-running-fiber branch December 21, 2023 13:39
@straight-shoota
Copy link
Member

straight-shoota commented Dec 21, 2023

Yeah, I had verified this patch locally as well. The error reproduced 0% while on master it was 100%.

We're merging trivial regression fixes quickly in order to continue operations with a stable master.

@straight-shoota straight-shoota added this to the 1.11.0 milestone Dec 21, 2023
@straight-shoota straight-shoota changed the title Fix: can't resume a running fiber Fix "can't resume a running fiber" Jan 2, 2024
@straight-shoota straight-shoota changed the title Fix "can't resume a running fiber" Fix can't resume a running fiber Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug kind:regression Something that used to correctly work but no longer works topic:multithreading topic:stdlib:concurrency
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Regression: can't resume a running fiber
2 participants