Condition/RecursiveLock: add ability to handle threads #30061

vtjnash · 2018-11-16T19:52:50Z

This extends Condition and RecursiveLock to assert that they may only be used in the single-threaded case (co-operatively scheduled), and then adds thread-safe versions of the same (ConditionMT and RecursiveLockMT). Additionally, the constructor for ConditionMT lets you pass in an existing RecursiveLockMT to allow having multiple notifications off of a single lock.

Unlike the existing thread-safe primitives (Threads.AsyncCondition, Threads.SpinLock, Threads.Mutex), these new types integrate with the Task system also, and thus should typically be preferred to those for user code. Unlike the existing task-only primitives, these work in all situations, but require the user to write explicit lock code annotating the protected code regions.

In version 2.0, we can consider switching the default for Condition from ConditionST to ConditionMT, but that would be a breaking change. (ConditionMT requires the user to hold a lock before calling wait, while that lock is currently implicit in Condition under the assumption of cooperative tasking—as explicitly represented in this PR by the NotALock type.)

Implement and close #30026

StefanKarpinski

This *MT naming convention seems a bit unfortunate. Is it correct to use Condition when Julia is multithreaded? Or ConditionMT when Julia is single-threaded?

base/event.jl

JeffBezanson · 2018-11-17T15:28:17Z

This *MT naming convention seems a bit unfortunate. Is it correct to use Condition when Julia is multithreaded? Or ConditionMT when Julia is single-threaded?

I agree --- code should be independent of the number of threads as much as possible. I think these can be seen as different ways of using conditions, rather than different kinds of conditions: in one case, you just want to be woken up and don't need to test a predicate, and in the other case you need to test a predicate and so need to continue holding a lock when wait returns. Maybe that could be an option passed to wait. Another way to look at it is that code that waits and then tests a predicate without holding the lock is not thread-safe and may need to change, but that is local to the caller of wait and doesn't need to affect the code that allocates the Condition.

vtjnash · 2018-11-18T17:26:03Z

This *MT naming convention seems a bit unfortunate. Is it correct to use Condition when Julia is multithreaded? Or ConditionMT when Julia is single-threaded?

It's a statement of the algorithm being used (whether concurrent modifications are supported), not the configuration of Julia. You can always use a thread-safe variant of any object (hence in 2.0, we might decide to make that the default) by just accepting the performance hit, but the ability to avoid the extra overhead of a lock while using the same code might remain desirable.

code should be independent of the number of threads as much as possible.

The only way to do that is to make everything very slow. I strongly disagree with this.

that is local to the caller of wait and doesn't need to affect the code that allocates the Condition.

I don't think a caller can be local, since something external ("the caller" could be anyone on the callstack) is, by definition, non-local to the callee.

doesn't need to affect the code that allocates the Condition.

Um, that's what C does, and is precisely what we just said we don't want to do (#30026)

JeffBezanson · 2018-11-19T02:45:26Z

No, what I'm saying is that we should try to skip ahead to making all of these thread-safe by default. I think it might be possible to arrange the API to enable that. For example, Condition() allocates a lock for itself, and wait(::Condition) acquires it first. Then the only remaining difference is whether the code calling wait expects to be holding a lock when it returns. For backwards compatibility, wait would have to release the lock. Then code that tests a predicate after wait, and that wants to become thread-safe, needs to call e.g. wait(c, locked=true) instead. That seems like an easier upgrade path to me than changing Condition to ConditionMT and adding a lock acquire and release to each use.

I feel more strongly about not having, or strongly discouraging using, a thread-unsafe lock. That seems like a logical place to be thread-safe by default. If it's truly necessary to have a task-only lock for performance tuning, that can be added later when needed and hidden away somewhere, possibly not even exported.

vtjnash · 2018-11-19T03:09:22Z

No. I don’t think you understand what’s going on here. I’ve been thinking about this a lot for the past couple months, and haven’t been able to find a substitute (sadly) and hence am finally turning this into a PR.

If you aren’t already holding the lock when you call ‘wait’, your code is broken. It’s absolutely useless to break the API just to return with the lock held (I do it here because it’s convenient and makes the implementation a bit more powerful around RecursiveLock—but it’s absolutely just optional code).

The options I’m aware of are: (a) break backwards compatibility and require you to hold the lock first (b) declare in the constructor whether correct locking is mandatory (this PR) or (c) declare / detect in the wait call whether you previously acquired the lock, and branch to the appropriate implementation at runtime

JeffBezanson · 2018-11-19T17:39:38Z

(c) declare / detect in the wait call whether you previously acquired the lock, and branch to the appropriate implementation at runtime

I think that's fairly close to what I'm proposing.

Condition is tricky, but I think we can at least get away with having only the thread-safe flavor of lock. Will be nice not to have both lock.jl and locks.jl...

vtjnash · 2018-11-19T18:21:04Z

Yes, I was actively testing the branch with the switch to change the default lock type to MT-safe, and that seems to build and work. Since those have the same API, your proposal seems like an obvious improvement.

I could switch Condition to ConditionMT, but (for backwards compatibility), that requires removing the concurrency violation errors in some places, and replacing them with some logic to bypass thread-safety instead when we detect that case. It's not my favorite option (Since I like that with the current state of the PR, the user states. via the choice of constructor, whether their code is updated to be thread-safe. And I think it's a bit easier to assert than to generally handle this case.). However, I don't foresee any actual blockers in getting the test-suite to pass with that option. (Implementation-wise, this might mean adding a flag to RecursiveLock of whether to disable error checking, or perhaps just handling it specially inside notify and wait)

vtjnash · 2018-11-20T02:19:18Z

Actually, in talking with Kiran today, there actually is another option (d). In this PR, I'm using the word Condition in the same sense as pthreads/C/Java/TBB/Winnt/etc. of a queue of threads waiting for an event. But we could also use the word in the sense of the simple notification trigger itself, or what might be called an Event in some other contexts (such as kevent/epoll/Winnt/Julia).

This would be an intermediate approach that alters some results but doesn't outright break all usage. In exchange, we might be able to gain partial MT-safety for the existing Condition object. In this option, we would instead deprecate Event and upgrade Condition to implement a notification trigger. Similar to kevent and epoll, it would be configurable as both a level-triggered (was Event) and edge-triggered (was Condition) variant (via a mutable runtime flag).

The trade-off here (pros and cons, in no particular order) is that

the first wake-up might now be spurious (and/or we might be able to provide an API to clear a stale event trigger), which might be considered breaking anyways;
it is not MT-safe for multiple simultaneous waiting threads (some threads may miss the wakeup notification), although it would sometimes be safe for the notify wakeup to happen asynchronously;
it seems like this might be a bit less orthogonal to the other APIs (it seems to do a little bit of everything—blocking Channels, EventMT, notify, atomics);
it might be faster someday than the MT-safe ConditionMT (since we might someday be able to avoid taking a lock, and that might save an atomic operation);
it also may be better representation of a repeated event such as UDP traffic, ZMQ pub/sub, or fast interval timers, and other similar cases where we don't care about missing some event notifications.

vtjnash · 2018-11-27T18:17:22Z

I've now added another commit demonstrating the ability for the Condition type to branch at runtime to ST or MT mode, based on whether we specify in the constructor that it requires—and thus will obey—the additional stipulations of the thread-safe API. (per off-line conversations with Jeff et al.)

As stated in the commit description, this is a fully functional prototype, but not necessarily a final implementation. As such, it still leaves in the machinery to define, use, and experiment with the *ST and *MT aliases. Once we settle on a design, I'll go back through and remove any content and flexibility that we didn't end up wanting.

NHDaly · 2018-12-05T22:56:22Z

We talked some last week about other options besides what is currently implemented in this PR. Is what we discussed then basically equivalent to option (c) above?

If so, if I remember correctly, the tradeoff was essentially:
PRO: automatically upgrading code in-place to be thread-safe wrt to the Condition's internals (even if it's leaving the surrounding code potentially thread un-safe)
CON: lose the ability to throw assertions in code that is not yet upgraded for thread safety.

Is that more or less right? If so, it seems to me that this tradeoff is not worth it.

Am I characterizing the discussion correctly? I'm sorry that I don't remember all of the details. Did we come to a consensus that I'm not remembering?

JeffBezanson · 2018-12-06T22:07:51Z

@vtjnash brought up the point that in the future there might be multi-threaded and single-threaded versions of various things, like HashMap vs. ConcurrentHashMap. So we might want to think about this from the perspective of having a general naming convention, which we can then also use here. Some possibilities:

Condition, ConditionMT
Condition, Threads.Condition
Condition, ConcurrentCondition

?

vtjnash · 2018-12-07T17:25:37Z

Talking more with Jeff yesterday, we proposed the following actions:

deprecate Condition (silently for now, removal targeted for v2.0)
introduce Threads.Condition with thread-safe API (name TBD, as mentioned above)
implement the behavior of the current Condition as an option for Event (future work, since we aren't using either right now)
everything else here (ReentrantLock, Channel, Event) to become thread-safe by default

NHDaly · 2018-12-07T17:33:55Z

+10 I think that all sounds really great to me!

I agree that the current behavior of Condition belongs with Event, and that nicely solves the naming problems. +1 and thanks for thinking hard about this! :)

JeffBezanson · 2018-12-07T17:35:46Z

Another option for resolving the Condition name conflict is to use more of the full term "condition variable", e.g. ConditionVariable, ConditionVar, CondVar, ...

JeffBezanson · 2018-12-12T20:19:56Z

👍 Looks good. We'll have to wait for the 1.1 news to be moved aside then rebase.

base/event.jl

NHDaly · 2018-12-13T22:07:23Z

Yay! Thanks for the hard work on this, Jameson! It looks really good! :))

This extends Condition to assert that it may only be used in the single-threaded case (co-operatively scheduled), and then adds a thread-safe version of the same: Threads.Condition. Additionally, it also upgrades ReentrantLock, etc. to be thread-safe.

StefanKarpinski · 2019-01-02T13:56:56Z

base/event.jl

+assert_havelock(l::AbstractLock, tid::Nothing) = error("concurrency violation detected")
+
+"""
+    AlwaysLockedST


This seems to be the only case where the original *ST and *MT naming has remained in the merged PR. Was that intentional or was this just overlooked?

Well, there aren't any MT and this type is only meant to be used internally. We could rename it though.

Right, that's what I was thinking—just calling it AlwaysLocked would be good.

In Julia 1.2, the error that occurs when releasing a semaphore with mismatched release/acquire counts is now an ErrorException, whereas previously it was an AssertionError. See JuliaLang/julia#30061

NHDaly · 2019-09-04T14:44:17Z

@vtjnash I wonder if it was a mistake to allow the old-style, non-thread-safe Conditions to be locked/unlocked.

This makes it somewhat easy to hit the following gotcha (which happened to me today):

julia> c = Condition()
Base.GenericCondition{Base.AlwaysLockedST}(Base.InvasiveLinkedList{Task}(nothing, nothing), Base.AlwaysLockedST(1))

julia> lock(c)

julia> Threads.@spawn begin lock(c) ;@info "HI" ;unlock(c) end
[ Info: HI
Task (done) @0x000000011ce3ead0

I forgot that I was using Condition instead of Threads.Condition in my code, so i was seeing a race-condition I didn't expect. It took me a while to finally narrow it down to the above MWE, and that's when I finally remembered that there is a new Threads.Condition I'm supposed to be using, and that lock(::Condition) is a no-op.

I think it would be better to have lock(::Condition) throw an error, rather than be a no-op, because there is simply no good reason to use the old-style Conditions with locks. If you're using locks, you almost certainly mean to be using the new, thread-safe, Threads.Condition.

Can we throw an error to enforce that, so that users aren't caught by this surprise like I was? :) What do you think? Thanks! :)

JeffBezanson · 2019-09-04T15:07:03Z

I believe Condition throws an error if another thread tries to use it. What was the race condition?

NHDaly · 2019-09-04T17:22:17Z

It seems that the base Condition does not throw an error, as shown in the code I posted above. That right there is a race-condition, because two threads are proceeding at the same time when I thought they would be locked. And it didn't throw an error like I would want it to.

The full situation I was in was trying to support Conditions in https://github.com/NHDaly/Select.jl (this is how I discovered the second problem in NHDaly/Select.jl#2).

When I was experimenting with it, i was accidentally using the wrong kind of Condition and didn't notice, and I was really confused by the behavior. I managed to finally figure it out by adding print-statements to this toy example:

julia> using Test; using Base.Threads: @spawn

julia> @sync begin
           coordinator = Channel()

           c1 = Condition()
           c2 = Condition()

           wait_on_conds() = begin
               @info "lock(c1)"
               lock(c1)
               @info "lock(c2)"
               lock(c2)
               put!(coordinator, 0)  # Notify main task that we're waiting. (Note that it will block on lock until we're actually asleep.)
               @async begin wait(c1); @info "c1" end
               @async begin wait(c2); @info "c2" end
               unlock(c2)
               unlock(c1)
           end

           t1 = @async wait_on_conds()
           t2 = @async wait_on_conds()

           take!(coordinator); take!(coordinator)  # Wait for both tasks to start waiting

           # Now, everyone is ready, so notifying c1 will wake up _both_ tasks
           lock(c1)
           @test notify(c1) == 2   # ERROR: This returns `1` instead, because the `lock` doesn't actually block until `t2` has waited like I expected.
           unlock(c1)
       end

(In-fact, I finally realized that this test wasn't going to work anyway, because wait(c) on the new task doesn't release the lock on the parent task, so this wouldn't work. (And if you use actual Threads.Conditions here, you get an error about it.) But in this case, I didn't notice that problem because the Base.Conditions hid the error, and my code proceeds to the @test before both tasks have started waiting, which is a race-condition_.)

JeffBezanson · 2019-09-04T17:48:55Z

I get

julia> c = Condition()
Base.GenericCondition{Base.AlwaysLockedST}(Base.InvasiveLinkedList{Task}(nothing, nothing), Base.AlwaysLockedST(1))

julia> lock(c)

julia> Threads.@spawn begin lock(c) ;@info "HI" ;unlock(c) end
Task (failed) @0x00007f56fe78e710
concurrency violation detected
error(::String) at ./error.jl:33
concurrency_violation() at ./condition.jl:8
assert_havelock at ./condition.jl:26 [inlined]
assert_havelock at ./condition.jl:49 [inlined]
lock at ./condition.jl:50 [inlined]
lock(::Base.GenericCondition{Base.AlwaysLockedST}) at ./condition.jl:74

I suppose the issue is that Base.Condition only really exists for backwards compatibility, and it never used to require or even allow locking. So yes, I can see the case for disallowing locking on them.

NHDaly · 2019-09-04T17:52:31Z

Weeirrrrd. It happily continues in parallel for me on both 1.3 and my 1-day old master 1.4.

julia> versioninfo()
Julia Version 1.3.0-rc1.0
Commit 768b25f6a8* (2019-08-18 00:04 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

julia> versioninfo()
Julia Version 1.4.0-DEV.79
Commit d3250fe005* (2019-09-02 19:07 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

NHDaly · 2019-09-04T17:56:59Z

I suppose the issue is that Base.Condition only really exists for backwards compatibility, and it never used to require or even allow locking. So yes, I can see the case for disallowing locking on them.

But yeah, exactly. That's my thinking as well. I think having the two types can lead to surprises like this, so any amount of failing early would be much appreciated! ❤️ Erroring on lock seems like one nice way to do that, but we should do that everywhere we can.

Ideallllly, it seems like it would be nice to consider renaming Base.Condition to something like Base.SingleThreadedCondition, find/replacing all of base to use that name, and then changing the default, unqualified Condition to be the new one and just say "SORRY" to any user code which will now throw an error that it needs to be locked. It's an easy error to fix, i think, and will be a nice reminder to start thinking about thread-safety. And I'd imagine there aren't too many uses of Condition in packages, anyway.

NHDaly · 2019-09-04T18:06:20Z

This is kind of a corny solution, but I've opened #33162 as a straw-person

EDIT: 👍 Thanks, this was merged. :) That should address my concerns

This extends Condition to assert that it may only be used in the single-threaded case (co-operatively scheduled), and then adds a thread-safe version of the same: `Threads.Condition`. Additionally, it also upgrades ReentrantLock, etc. to be thread-safe.

StefanKarpinski reviewed Nov 16, 2018

View reviewed changes

base/event.jl Outdated Show resolved Hide resolved

base/event.jl Outdated Show resolved Hide resolved

ararslan added the multithreading Base.Threads and related functionality label Nov 16, 2018

vtjnash force-pushed the jn/condition-mt branch from 1323511 to e0e8391 Compare November 19, 2018 18:19

vtjnash force-pushed the jn/condition-mt branch from e0e8391 to c2ef8fe Compare November 27, 2018 18:03

vtjnash added the DO NOT MERGE Do not merge this PR! label Nov 27, 2018

vtjnash mentioned this pull request Nov 28, 2018

Channel: add ability to handle threads #30186

Merged

vtjnash added the triage This should be discussed on a triage call label Dec 6, 2018

vtjnash force-pushed the jn/condition-mt branch 4 times, most recently from 324c2d5 to 06caab5 Compare December 11, 2018 19:12

vtjnash removed DO NOT MERGE Do not merge this PR! triage This should be discussed on a triage call labels Dec 11, 2018

vtjnash force-pushed the jn/condition-mt branch from 06caab5 to fef54a7 Compare December 11, 2018 19:12

vtjnash force-pushed the jn/condition-mt branch from fef54a7 to eaa8cf1 Compare December 12, 2018 20:32

NHDaly reviewed Dec 13, 2018

View reviewed changes

base/event.jl Outdated Show resolved Hide resolved

NHDaly reviewed Dec 13, 2018

View reviewed changes

base/event.jl Show resolved Hide resolved

vtjnash force-pushed the jn/condition-mt branch 3 times, most recently from b0c2e18 to a6ea3ef Compare December 17, 2018 02:48

vtjnash force-pushed the jn/condition-mt branch from a6ea3ef to 515e5c1 Compare December 17, 2018 20:06

vtjnash merged commit a0b7a76 into master Dec 18, 2018

vtjnash deleted the jn/condition-mt branch December 18, 2018 02:23

NHDaly mentioned this pull request Dec 20, 2018

Nit: Fix links in NEWS.md by running NEWS-update.jl #30474

Merged

StefanKarpinski reviewed Jan 2, 2019

View reviewed changes

KristofferC mentioned this pull request May 15, 2019

Backports for 1.2-RC1 #31727

Closed

58 tasks

NHDaly mentioned this pull request Sep 4, 2019

Have threading functions (lock, unlock, ...) throw Error for old Base.Condition #33162

Merged

inkydragon mentioned this pull request Nov 23, 2021

Fix typo: RecursiveLock to ReentrantLock #43192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Condition/RecursiveLock: add ability to handle threads #30061

Condition/RecursiveLock: add ability to handle threads #30061

vtjnash commented Nov 16, 2018

StefanKarpinski left a comment

JeffBezanson commented Nov 17, 2018

vtjnash commented Nov 18, 2018

JeffBezanson commented Nov 19, 2018

vtjnash commented Nov 19, 2018

JeffBezanson commented Nov 19, 2018

vtjnash commented Nov 19, 2018

vtjnash commented Nov 20, 2018

vtjnash commented Nov 27, 2018

NHDaly commented Dec 5, 2018

JeffBezanson commented Dec 6, 2018

vtjnash commented Dec 7, 2018

NHDaly commented Dec 7, 2018

JeffBezanson commented Dec 7, 2018

JeffBezanson commented Dec 12, 2018

NHDaly commented Dec 13, 2018

StefanKarpinski Jan 2, 2019

JeffBezanson Jan 2, 2019

StefanKarpinski Jan 2, 2019

NHDaly commented Sep 4, 2019

JeffBezanson commented Sep 4, 2019

NHDaly commented Sep 4, 2019

JeffBezanson commented Sep 4, 2019

NHDaly commented Sep 4, 2019 •

edited

Loading

NHDaly commented Sep 4, 2019

NHDaly commented Sep 4, 2019 •

edited

Loading

Condition/RecursiveLock: add ability to handle threads #30061

Condition/RecursiveLock: add ability to handle threads #30061

Conversation

vtjnash commented Nov 16, 2018

StefanKarpinski left a comment

Choose a reason for hiding this comment

JeffBezanson commented Nov 17, 2018

vtjnash commented Nov 18, 2018

JeffBezanson commented Nov 19, 2018

vtjnash commented Nov 19, 2018

JeffBezanson commented Nov 19, 2018

vtjnash commented Nov 19, 2018

vtjnash commented Nov 20, 2018

vtjnash commented Nov 27, 2018

NHDaly commented Dec 5, 2018

JeffBezanson commented Dec 6, 2018

vtjnash commented Dec 7, 2018

NHDaly commented Dec 7, 2018

JeffBezanson commented Dec 7, 2018

JeffBezanson commented Dec 12, 2018

NHDaly commented Dec 13, 2018

StefanKarpinski Jan 2, 2019

Choose a reason for hiding this comment

JeffBezanson Jan 2, 2019

Choose a reason for hiding this comment

StefanKarpinski Jan 2, 2019

Choose a reason for hiding this comment

NHDaly commented Sep 4, 2019

JeffBezanson commented Sep 4, 2019

NHDaly commented Sep 4, 2019

JeffBezanson commented Sep 4, 2019

NHDaly commented Sep 4, 2019 • edited Loading

NHDaly commented Sep 4, 2019

NHDaly commented Sep 4, 2019 • edited Loading

NHDaly commented Sep 4, 2019 •

edited

Loading

NHDaly commented Sep 4, 2019 •

edited

Loading