Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process: ensure uvfinalize and _uv_close_cb are synchronized #44476

Merged
merged 2 commits into from
Mar 8, 2022
Merged

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Mar 5, 2022

There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

Fixes #44460

(added backport labels, since in quick testing, it seems to apply cleanly to those branches)

There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

Fixes #44460
@vtjnash vtjnash added backport 1.8 Change should be backported to release-1.8 backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Mar 5, 2022
@KristofferC KristofferC mentioned this pull request Mar 7, 2022
47 tasks
…ronized

The uv_return_spawn callback also needs to be synchronized with the
constructor, since we might have arrived there before we finished
allocating the Process struct here, leading to missed exit events.
@vtjnash vtjnash merged commit c591bf2 into master Mar 8, 2022
@vtjnash vtjnash deleted the jn/44460 branch March 8, 2022 20:31
KristofferC pushed a commit that referenced this pull request Mar 11, 2022
There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

The uv_return_spawn callback also needs to be synchronized with the
constructor, since we might have arrived there before we finished
allocating the Process struct here, leading to missed exit events.

Fixes #44460

(cherry picked from commit c591bf2)
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Mar 15, 2022
@KristofferC KristofferC mentioned this pull request Mar 15, 2022
50 tasks
@KristofferC
Copy link
Sponsor Member

Needs a manual backport towards backports-release-1.6.

@KristofferC KristofferC mentioned this pull request Apr 19, 2022
40 tasks
KristofferC pushed a commit that referenced this pull request Apr 19, 2022
There appeared to be a possibility they could race and the data field
might already be destroyed before we reached the close callback,
from looking at the state of the program when reproducing #44460.

This is because the uv_return_spawn set the handle to NULL, which later
can cause the uvfinalize to exit early (if the finalizer gets run on
another thread, since we have disabled finalizers on our thread). Then
the GC can reap the julia Process object prior to uv_close cleaning up
the object. We solve this by calling disassociate_julia_struct before
dropping the reference to the handle. But then we also fully address any
remaining race condition by having uvfinalize acquire a lock also.

The uv_return_spawn callback also needs to be synchronized with the
constructor, since we might have arrived there before we finished
allocating the Process struct here, leading to missed exit events.

Fixes #44460

(cherry picked from commit c591bf2)
@KristofferC KristofferC mentioned this pull request May 16, 2022
45 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.6 Change should be backported to release-1.6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Occasional segfaults when running with @threads
2 participants