Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread safety in atexit(f): Lock access to atexit_hooks #49774

Merged
merged 1 commit into from
May 15, 2023

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented May 11, 2023

  • atexit(f) mutates global shared state.
  • atexit(f) can be called anytime by any thread.
  • Accesses & mutations to global shared state must be locked if they can be accessed from multiple threads.

Fixes #49746

@NHDaly NHDaly requested review from kpamnany and d-netto May 11, 2023 19:27
base/initdefs.jl Outdated Show resolved Hide resolved
Copy link
Contributor

@kpamnany kpamnany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

I'm fine with the one-liner.

Guessing you checked whether the test fails on current master?

base/initdefs.jl Outdated Show resolved Hide resolved
base/initdefs.jl Outdated Show resolved Hide resolved
@NHDaly
Copy link
Member Author

NHDaly commented May 11, 2023

Guessing you checked whether the test fails on current master?

@kpamnany: Yes, and boy do they fail!...

julia> using Test

julia> @testset "atexit thread safety" begin
           f = () -> nothing
           before_len = length(Base.atexit_hooks)
           @sync begin
               for _ in 1:1_000_000
                   Threads.@spawn begin
                       atexit(f)
                   end
               end
           end
           @test length(Base.atexit_hooks) == before_len + 1_000_000
           @test all(hook -> hook === f, Base.atexit_hooks[1 : 1_000_000])

           # cleanup
          # Base.@lock Base._atexit_lock begin
          #     deleteat!(Base.atexit_hooks, 1:1_000_000)
          # end
       end

[57702] signal (11.2): Segmentation fault: 11
in expression starting at REPL[2]:1
sweep_big_list at /Users/nathandaly/src/julia/src/gc.c:1094 [inlined]
sweep_big at /Users/nathandaly/src/julia/src/gc.c:1116 [inlined]
gc_sweep_other at /Users/nathandaly/src/julia/src/gc.c:1608 [inlined]
_jl_gc_collect at /Users/nathandaly/src/julia/src/gc.c:3315
ijl_gc_collect at /Users/nathandaly/src/julia/src/gc.c:3486
maybe_collect at /Users/nathandaly/src/julia/src/gc.c:954 [inlined]
jl_gc_pool_alloc_inner at /Users/nathandaly/src/julia/src/gc.c:1317
ijl_gc_pool_alloc at /Users/nathandaly/src/julia/src/gc.c:1368
jfptr_iterate_34613 at /Users/nathandaly/src/julia/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at /Users/nathandaly/src/julia/src/gf.c:0 [inlined]

... here's how it failed on 1.8.2 😂:

signal (11): Segmentation fault: 11

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1
_ZNK5dyld311MachOLoaded17findClosestSymbolEyPPKcPy at /usr/lib/dyld (unknown line)
Allocations: 2961458 (Pool: 2959697; Big: 1761); GC: 3

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1
write at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 2961458 (Pool: 2959697; Big: 1761); GC: 3

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1
_platform_strlen at /usr/lib/system/libsystem_platform.dylib (unknown line)
Allocations: 2961458 (Pool: 2959697; Big: 1761); GC: 3

...

lol there were like 30 segfaults reported 😂

- atexit(f) mutates global shared state.
- atexit(f) can be called anytime by any thread.
- Accesses & mutations to global shared state must be locked if they can
  be accessed from multiple threads.

Add unit test for thread safety of adding many atexit functions in
parallel
@NHDaly NHDaly force-pushed the nhd-atexit-thread-safety branch from 98d72e8 to 9566db2 Compare May 12, 2023 18:46
@KristofferC KristofferC merged commit 9dd3090 into master May 15, 2023
@KristofferC KristofferC deleted the nhd-atexit-thread-safety branch May 15, 2023 14:09
@KristofferC KristofferC added the backport 1.9 Change should be backported to release-1.9 label May 15, 2023
@KristofferC KristofferC mentioned this pull request May 19, 2023
51 tasks
KristofferC pushed a commit that referenced this pull request May 19, 2023
- atexit(f) mutates global shared state.
- atexit(f) can be called anytime by any thread.
- Accesses & mutations to global shared state must be locked if they can
  be accessed from multiple threads.

Add unit test for thread safety of adding many atexit functions in
parallel

(cherry picked from commit 9dd3090)
kpamnany pushed a commit to RelationalAI/julia that referenced this pull request May 24, 2023
…ang#49774)

- atexit(f) mutates global shared state.
- atexit(f) can be called anytime by any thread.
- Accesses & mutations to global shared state must be locked if they can
  be accessed from multiple threads.

Add unit test for thread safety of adding many atexit functions in
parallel
KristofferC pushed a commit that referenced this pull request May 27, 2023
- atexit(f) mutates global shared state.
- atexit(f) can be called anytime by any thread.
- Accesses & mutations to global shared state must be locked if they can
  be accessed from multiple threads.

Add unit test for thread safety of adding many atexit functions in
parallel

(cherry picked from commit 9dd3090)
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label May 28, 2023
NHDaly added a commit to RelationalAI/julia that referenced this pull request Jun 7, 2023
Fixes JuliaLang#49841.

Follow-up to JuliaLang#49774.

This PR makes two changes:
1. It locks `atexit_hooks` while iterating the hooks during execution of `_atexit()` at shutdown.
    - This prevents any data races if another Task is registering a new atexit hook while the hooks are being evaluated.
2. It defines semantics for what happens if another Task attempts to register another atexit hook _after all the hooks have finished_, and we've proceeded on to the rest of shutdown.
    - Previously, those atexit hooks would be _ignored,_ which violates the user's expectations and violates the "atexit" contract.
    - Now, the attempt to register the atexit hook will **throw an exception,** which ensures that we never break our promise, since the user was never able to register the atexit hook at all.
    - This does mean that users will need to handle the thrown exception and likely do now whatever tear down they were hoping to delay until exit.
kpamnany pushed a commit that referenced this pull request Jun 21, 2023
- atexit(f) mutates global shared state.
- atexit(f) can be called anytime by any thread.
- Accesses & mutations to global shared state must be locked if they can
  be accessed from multiple threads.

Add unit test for thread safety of adding many atexit functions in
parallel

(cherry picked from commit 9dd3090)
NHDaly added a commit that referenced this pull request Jul 19, 2023
Fixes #49841.

Follow-up to #49774.

This PR makes two changes:
1. It locks `atexit_hooks` while iterating the hooks during execution of `_atexit()` at shutdown.
    - This prevents any data races if another Task is registering a new atexit hook while the hooks are being evaluated.
2. It defines semantics for what happens if another Task attempts to register another atexit hook _after all the hooks have finished_, and we've proceeded on to the rest of shutdown.
    - Previously, those atexit hooks would be _ignored,_ which violates the user's expectations and violates the "atexit" contract.
    - Now, the attempt to register the atexit hook will **throw an exception,** which ensures that we never break our promise, since the user was never able to register the atexit hook at all.
    - This does mean that users will need to handle the thrown exception and likely do now whatever tear down they were hoping to delay until exit.
NHDaly added a commit to RelationalAI/julia that referenced this pull request Jul 21, 2023
* Lock the atexit_hooks during execution of the hooks on shutdown.

Fixes JuliaLang#49841.

Follow-up to JuliaLang#49774.

This PR makes two changes:
1. It locks `atexit_hooks` while iterating the hooks during execution of `_atexit()` at shutdown.
    - This prevents any data races if another Task is registering a new atexit hook while the hooks are being evaluated.
2. It defines semantics for what happens if another Task attempts to register another atexit hook _after all the hooks have finished_, and we've proceeded on to the rest of shutdown.
    - Previously, those atexit hooks would be _ignored,_ which violates the user's expectations and violates the "atexit" contract.
    - Now, the attempt to register the atexit hook will **throw an exception,** which ensures that we never break our promise, since the user was never able to register the atexit hook at all.
    - This does mean that users will need to handle the thrown exception and likely do now whatever tear down they were hoping to delay until exit.

* Fix merge conflict
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reproducible Memory Corruption in atexit_hooks (on MacOS) leading to GC corruption and/or segfault
3 participants