Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency violation detected on multithread config #229

Closed
krynju opened this issue Jul 15, 2021 · 5 comments · Fixed by #234
Closed

Concurrency violation detected on multithread config #229

krynju opened this issue Jul 15, 2021 · 5 comments · Fixed by #234

Comments

@krynju
Copy link
Member

krynju commented Jul 15, 2021

Noticed this on a multi-thread env. Was working on one thread env for a while and didn't notice this before
It's happening on the branch from this PR #223 and on master as well

➜  Dagger.jl git:(master) julia -t4
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.1 (2021-04-23)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

(@v1.6) pkg> activate .
  Activating environment at `/tmp/Dagger.jl/Project.toml`

julia> using Dagger

julia> f = (x) -> Dagger.@spawn sum(1:x)
#1 (generic function with 1 method)

julia> f(100)
EagerThunk (running)

julia> Error in eager scheduler:
Thunk(eager_thunk, ()) (id 2) failure:
concurrency violation detected
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] concurrency_violation()
    @ Base ./condition.jl:8
  [3] assert_havelock
    @ ./condition.jl:25 [inlined]
  [4] assert_havelock
    @ ./condition.jl:48 [inlined]
  [5] assert_havelock
    @ ./condition.jl:72 [inlined]
  [6] notify(c::Condition, arg::Any, all::Bool, error::Bool)
    @ Base ./condition.jl:126
  [7] #notify#515
    @ ./condition.jl:124 [inlined]
  [8] notify (repeats 2 times)
    @ ./condition.jl:124 [inlined]
  [9] (::Dagger.Sch.var"#43#44"{DataType, UInt64, UInt64})()
    @ Dagger.Sch /tmp/Dagger.jl/src/sch/eager.jl:38
 [10] lock(f::Dagger.Sch.var"#43#44"{DataType, UInt64, UInt64}, l::ReentrantLock)
    @ Base ./lock.jl:187
 [11] adjust_pressure!(h::Dagger.Sch.SchedulerHandle, proctype::Type, pressure::UInt64)
    @ Dagger.Sch /tmp/Dagger.jl/src/sch/eager.jl:36
 [12] eager_thunk()
    @ Dagger.Sch /tmp/Dagger.jl/src/sch/eager.jl:70
 [13] macro expansion
    @ /tmp/Dagger.jl/src/processor.jl:154 [inlined]
 [14] (::Dagger.var"#51#52"{typeof(Dagger.Sch.eager_thunk), Tuple{}, NamedTuple{(:sch_uid, :sch_handle, :processor, :utilization), Tuple{UInt64, Dagger.Sch.SchedulerHandle, Dagger.ThreadProc, UInt64}}})()
    @ Dagger ./threadingconstructs.jl:169
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] fetch
   @ ./task.jl:337 [inlined]
 [3] execute!(::Dagger.ThreadProc, ::Function)
   @ Dagger /tmp/Dagger.jl/src/processor.jl:157
 [4] do_task(to_proc::Dagger.ThreadProc, extra_util::UInt64, thunk_id::Int64, f::Function, data::Tuple{}, send_result::Bool, persist::Bool, cache::Bool, meta::Bool, options::Dagger.Sch.ThunkOptions, ids::Vector{Union{}}, ctx_vars::NamedTuple{(:log_sink, :profile), Tuple{Dagger.NoOpLog, Bool}}, sch_handle::Dagger.Sch.SchedulerHandle, uid::UInt64)
   @ Dagger.Sch /tmp/Dagger.jl/src/sch/Sch.jl:907
 [5] macro expansion
   @ /tmp/Dagger.jl/src/sch/Sch.jl:821 [inlined]
 [6] (::Dagger.Sch.var"#104#105"{Dagger.ThreadProc, Distributed.RemoteChannel{Channel{Any}}, Tuple{UInt64, Int64, typeof(Dagger.Sch.eager_thunk), Tuple{}, Bool, Bool, Bool, Bool, Dagger.Sch.ThunkOptions, Vector{Union{}}, NamedTuple{(:log_sink, :profile), Tuple{Dagger.NoOpLog, Bool}}, Dagger.Sch.SchedulerHandle, UInt64}})()
   @ Dagger.Sch ./task.jl:411
Stacktrace:
 [1] compute_dag(ctx::Context, d::Thunk; options::Dagger.Sch.SchedulerOptions)
   @ Dagger.Sch /tmp/Dagger.jl/src/sch/Sch.jl:443
 [2] compute(ctx::Context, d::Thunk; options::Dagger.Sch.SchedulerOptions)
   @ Dagger /tmp/Dagger.jl/src/compute.jl:31
 [3] (::Dagger.Sch.var"#41#42"{Context})()
   @ Dagger.Sch ./task.jl:411
julia>
@krynju krynju changed the title Concurrency violation detected on # Concurrency violation detected on https://github.com/JuliaParallel/Dagger.jl/pull/223 Jul 15, 2021
@krynju krynju changed the title Concurrency violation detected on https://github.com/JuliaParallel/Dagger.jl/pull/223 Concurrency violation detected on PR #223 Jul 15, 2021
@krynju krynju changed the title Concurrency violation detected on PR #223 Concurrency violation detected on multithread config Jul 15, 2021
@jpsamaroo
Copy link
Member

Yeah, this is basically because the notify call here uses a Base.Condition, which isn't thread-safe. I have some changes locally that change this to a Threads.Condition, which eliminates this issue.

@ExpandingMan
Copy link
Contributor

Not sure if this matters, but I'm also seeing this when using multiple separate worker processes, e.g. julia -p3.

@jpsamaroo
Copy link
Member

@ExpandingMan is it the same stack trace, coming from the same location?

@jpsamaroo jpsamaroo linked a pull request Jul 17, 2021 that will close this issue
@ExpandingMan
Copy link
Contributor

ExpandingMan commented Jul 17, 2021

This is on latest master on 1.6:

◖0▮expandingman@asus-desk-2▮~▮◗ julia -p3
julia> using Dagger: @spawn

julia> t = @spawn sin(π)
EagerThunk (running)

julia> Error in eager scheduler:
Thunk(eager_thunk, ()) (id 2) failure:
concurrency violation detected
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] concurrency_violation()
    @ Base ./condition.jl:8
  [3] assert_havelock
    @ ./condition.jl:25 [inlined]
  [4] assert_havelock
    @ ./condition.jl:48 [inlined]
  [5] assert_havelock
    @ ./condition.jl:72 [inlined]
  [6] notify(c::Condition, arg::Any, all::Bool, error::Bool)
    @ Base ./condition.jl:126
  [7] #notify#515
    @ ./condition.jl:124 [inlined]
  [8] notify (repeats 2 times)
    @ ./condition.jl:124 [inlined]
  [9] (::Dagger.Sch.var"#43#44"{DataType, UInt64, UInt64})()
    @ Dagger.Sch ~/.julia/dev/Dagger/src/sch/eager.jl:38
 [10] lock(f::Dagger.Sch.var"#43#44"{DataType, UInt64, UInt64}, l::ReentrantLock)
    @ Base ./lock.jl:187
 [11] adjust_pressure!(h::Dagger.Sch.SchedulerHandle, proctype::Type, pressure::UInt64)
    @ Dagger.Sch ~/.julia/dev/Dagger/src/sch/eager.jl:36
 [12] eager_thunk()
    @ Dagger.Sch ~/.julia/dev/Dagger/src/sch/eager.jl:70
 [13] macro expansion
    @ ~/.julia/dev/Dagger/src/processor.jl:154 [inlined]
 [14] (::Dagger.var"#51#52"{typeof(Dagger.Sch.eager_thunk), Tuple{}, NamedTuple{(:sch_uid, :sch_handle, :processor, :utilization), Tuple{UInt64, Dagger.Sch.SchedulerHandle, Dagger.ThreadProc, UInt64}}})()
    @ Dagger ./threadingconstructs.jl:169
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] fetch
   @ ./task.jl:337 [inlined]
 [3] execute!(::Dagger.ThreadProc, ::Function)
   @ Dagger ~/.julia/dev/Dagger/src/processor.jl:157
 [4] do_task(to_proc::Dagger.ThreadProc, extra_util::UInt64, thunk_id::Int64, f::Function, data::Tuple{}, send_result::Bool, persist::Bool, cache::Bool, meta::Bool, options::Dagger.Sch.ThunkOptions, ids::Vector{Union{}}, ctx_vars::NamedTuple{(:log_sink, :profile), Tuple{Dagger.NoOpLog, Bool}}, sch_handle::Dagger.Sch.SchedulerHandle, uid::UInt64)
   @ Dagger.Sch ~/.julia/dev/Dagger/src/sch/Sch.jl:919
 [5] macro expansion
   @ ~/.julia/dev/Dagger/src/sch/Sch.jl:833 [inlined]
 [6] (::Dagger.Sch.var"#104#105"{Dagger.ThreadProc, RemoteChannel{Channel{Any}}, Tuple{UInt64, Int64, typeof(Dagger.Sch.eager_thunk), Tuple{}, Bool, Bool, Bool, Bool, Dagger.Sch.ThunkOptions, Vector{Union{}}, NamedTuple{(:log_sink, :profile), Tuple{Dagger.NoOpLog, Bool}}, Dagger.Sch.SchedulerHandle, UInt64}})()
   @ Dagger.Sch ./task.jl:411
Stacktrace:
 [1] compute_dag(ctx::Dagger.Context, d::Dagger.Thunk; options::Dagger.Sch.SchedulerOptions)
   @ Dagger.Sch ~/.julia/dev/Dagger/src/sch/Sch.jl:451
 [2] compute(ctx::Dagger.Context, d::Dagger.Thunk; options::Dagger.Sch.SchedulerOptions)
   @ Dagger ~/.julia/dev/Dagger/src/compute.jl:31
 [3] (::Dagger.Sch.var"#41#42"{Dagger.Context})()
   @ Dagger.Sch ./task.jl:411

Looks the same at a glance.

@jpsamaroo
Copy link
Member

Yeah that's one of the other places I'd expect this to occur. Well, hopefully now it's fixed for good on the latest release 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants