Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove threadedregion and move jl_threading_run to julia #32477

Closed
wants to merge 1 commit into from

Conversation

JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Jul 2, 2019

Putting this up to see if we can find any adversarial test cases.

Resolves #32970

@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label Jul 2, 2019
@staticfloat
Copy link
Member

Wow, this is great! Using the benchmarks from #32181, we get:

julia-master with 1 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.241 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(28.450 μs)
julia-master with 2 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(25.977 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(17.003 μs)

Versus:

julia-threadedregion with 1 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.241 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(25.486 μs)
julia-threadedregion with 2 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.250 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(13.774 μs)

Not only did you significantly shrink the overhead amount, you sped up the multithreaded case!

@staticfloat
Copy link
Member

The testing failures were all timeouts so I've restarted them to try and get a feel for how reliable those timeouts are. I've found some threading lock-ups in my explorations as well, opened an issue here: #32511

ccall(:jl_threading_run, Cvoid, (Any,), threadsfor_fun)
in_threaded_loop[] = false
end
threading_run(threadsfor_fun)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are never invoking threadsfor_fun with an argument, should the onethread argument be removed?

@vtjnash
Copy link
Member

vtjnash commented Jul 6, 2019

Wow, this is great!

One possible cause for that is that this additionally deletes the IO-capable scheduler and always instead uses the multi-threading-optimized scheduler. We currently may have a small lag when switching, while this just instead adds a large lag (latency) for any actual IO work.

@JeffBezanson
Copy link
Member Author

while this just instead adds a large lag (latency) for any actual IO work.

Right; part of the goal of this PR was to motivate a benchmark for that and then try to address it. So, do your worst! :)

@JeffBezanson
Copy link
Member Author

Wait, we are still polling jl_process_events in the task_get_next loop, so I don't think there is any extra I/O latency (just slightly slower task switching).

@JeffBezanson JeffBezanson changed the title WIP: remove threadedregion and move jl_threading_run to julia remove threadedregion and move jl_threading_run to julia Jul 15, 2019
@JeffBezanson JeffBezanson force-pushed the jb/threadedregion branch 3 times, most recently from f5cf4be to d76ac5d Compare August 7, 2019 22:16
@oschulz
Copy link
Contributor

oschulz commented Aug 8, 2019

Earlier, I saw some code ccall(:jl_set_task_tid, ...) in this PR. Would it maybe be useful to expose an official method like

function schedule(t::Task, tid)
    @assert !istaskstarted(t)
    t.sticky = true
    ccall(:jl_set_task_tid, Cvoid, (Any, Cint), t, tid-1)
    schedule(t)
    return t
end

to the user, for applications that require more direct control over what's run on which thread?

@oschulz
Copy link
Contributor

oschulz commented Aug 8, 2019

A possible application for such a schedule(t::Task, tid) would be benchmarks across NUMA domains (with thread-pinning), etc.

@JeffBezanson
Copy link
Member Author

Yes, that's possible to add. My thinking was just that it's not a good default for @threads, since it's not as composable.

@oschulz
Copy link
Contributor

oschulz commented Aug 8, 2019

My thinking was just that it's not a good default for @threads, since it's not as composable.

Oh sure - I was really thinking about specialized use cases. Certainly not something the average user will require, but HPC applications may sometimes need it, e.g. for benchmarking and performance analysis, etc. For such cases, I think it would be helpful to have an "offical" API, like schedule(t::Task, tid) - assuming :jl_set_task_tid or an equivalent low-level function to power it will continue to exist.

@mbauman
Copy link
Member

mbauman commented Apr 28, 2020

I'd really love to see this merged — or otherwise make the "default/easy/obvious" for loop macro do this in some way. As noted by Jacob in the competing PR, it's unfortunately breaking in the wild.

#35003 (comment)

I think it's been mentioned elsewhere, but just wanted to add a voice that this would indeed be breaking; a few of us rely on the current behavior in order to start worker tasks on specific threads, such as the code here. I'd say as long as we have either: 1) a way to reserve a main thread where tasks couldn't be spawned, or 2) a way to still spawn a threaded task on a specific thread (e.g. Threads.@spawn 2 expr), then we could still get the behavior we're after.

@JeffBezanson
Copy link
Member Author

Is the appealing part the dynamic schedule? We could get that in a less invasive way just by flipping the sticky bit within the current implementation.

@mbauman
Copy link
Member

mbauman commented Apr 28, 2020

The main thing is that I want to be able to throw @threads for inside of libraries and not worry about oversubscribing if an app also uses @threads for.

@JeffBezanson
Copy link
Member Author

There won't be any oversubscription either way (except via a native library), but I assume the real problem is the schedule --- we don't want @threads to pin tasks to specific threads by default?

@mbauman
Copy link
Member

mbauman commented Apr 28, 2020

Right, of course, I wasn't thinking straight. Yes, the scheduling won't be as ideal, but I also have it in my head that nested @threads loops are crashy and not the best idea. Is that better these days?

@JeffBezanson
Copy link
Member Author

Should not be crashy.

@OkonSamuel
Copy link
Contributor

@JeffBezanson. Just wanted to clarify. I used to be of the opinion that nesting @threads for calls would oversubcribe CPU. Are you saying that apart from the dynamic scheduling issue @threads for won't oversubcribe ?

@JeffBezanson
Copy link
Member Author

Correct, @threads will never oversubscribe your CPUs. The number of threads we use is fixed at startup, and @threads and @spawn just create tasks to run on those threads.

@ranocha
Copy link
Member

ranocha commented Jan 26, 2021

Are there any plans when this will be merged? It would be really nice to get the speed-up when using Threads.@threads for ... using a single thread, cf. https://discourse.julialang.org/t/overhead-of-threads-threads/53964.

@vtjnash vtjnash closed this Jan 27, 2021
@vtjnash
Copy link
Member

vtjnash commented Jan 27, 2021

This was moved to Julia (and so the PR is conflicted)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Threads are disabled if an error occurs
7 participants