Fast-track `@threads` when `nthreads() == 1` #32181

staticfloat · 2019-05-29T21:43:09Z

This avoids overhead when threading is disabled, Example benchmark:

using BenchmarkTools, Base.Threads, Test

function func(val, N)
    sums = [0*(1 .^ val) for thread_idx in 1:nthreads()]
    for idx in 1:N
        sums[threadid()] += idx.^val
    end
    return sum(sums)
end

function func_threaded(val, N)
    sums = [0*(1 .^ val) for thread_idx in 1:nthreads()]
    @threads for idx in 1:N
        sums[threadid()] += idx.^val
    end
    return sum(sums)
end

# Ensure they all get the same answer
@test func(2.0, 1<<10) == func_threaded(2.0, 1<<10)

@show @benchmark func(2.0, 1<<10)
@show @benchmark func_threaded(2.0, 1<<10)

I run the benchmarks as:

for JULIA in julia-master ./julia; do
	for T in 1 2; do
		echo "$JULIA with $T threads:"
		JULIA_NUM_THREADS=$T $JULIA speedtest.jl
	done
done

Before this PR:

julia-master with 1 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.243 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(28.331 μs)
julia-master with 2 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.239 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(17.019 μs)

After this PR:

./julia with 1 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.254 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(24.257 μs)
./julia with 2 threads:
#= /Users/sabae/src/julia/speedtest.jl:22 =# @benchmark(func(2.0, 1 << 10)) = Trial(24.263 μs)
#= /Users/sabae/src/julia/speedtest.jl:23 =# @benchmark(func_threaded(2.0, 1 << 10)) = Trial(17.008 μs)

yuyichao · 2019-05-29T21:59:53Z

This check should not happen at macro expansion time.

raminammour · 2019-05-29T23:07:26Z

Hello,

The only way I have found to work around #15276 is to make sure that the code with @threads, nthreads()=1 runs almost as fast as the non-threaded code. I even suspect that is the reason for the slowdown (@code_warntype shows a Core.Box) in your example.

Would this not make it harder to detect that the closure bug is preventing speedup in multi-threaded code?

Cheers!

staticfloat · 2019-05-29T23:37:11Z

@raminammour while #15276 is something that can get triggered by code like this (and is in this case), this is fixing something independent of that. Yes, the Box slows this down, and when I change this PR according to the feedback above we continue to trigger the Box problems, and thereby don't get the same speedup; but we do get a decent amount of speedup still, because we are eliminating a different source of slowdown.

If you want to test for #15276 style problems, I suggest you do a more direct test than relying on the fact that @threads introduces a closure that induces boxing; that may not be true forever (indeed I hope it's not!).

I've updated the original PR message with new performance metrics and a slightly updated benchmark script.

raminammour · 2019-05-30T14:51:47Z

Just FYI, the benchmarks on my system are different:


using BenchmarkTools, Base.Threads

function func(val)
    local sum = 0*(1 .^ val)
    for idx in 1:100
        sum += idx.^val
    end
    return sum
end

function func_threaded_let(val)
    local sum = 0*(1 .^ val)
    @threads for idx in 1:100
        let sum=sum
        sum += idx.^val
        end
    end
    return sum
end
function func_threaded(val)
    local sum = 0*(1 .^ val)
    @threads for idx in 1:100
        sum += idx.^val
    end
    return sum
end

@show @benchmark func(2.0)
@show @benchmark func_threaded(2.0)
@show @benchmark func_threaded_let(2.0);
versioninfo()

#= In[134]:30 =# @benchmark(func(2.0)) = Trial(2.333 μs)
#= In[134]:31 =# @benchmark(func_threaded(2.0)) = Trial(4.551 μs)
#= In[134]:32 =# @benchmark(func_threaded_let(2.0)) = Trial(2.568 μs)
Julia Version 1.0.2
Commit d789231e99 (2018-11-08 20:11 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

Hope this helps.

jebej · 2019-05-30T15:56:14Z

It seems most of the slowdown is due to the closure bug, I get:

#= REPL[11]:1 =# @benchmark(func(2.0)) = Trial(129.937 ns)

#= REPL[12]:1 =# @benchmark(func_threaded(2.0)) = Trial(3.716 μs)

#= REPL[13]:1 =# @benchmark(func_threaded_let(2.0)) = Trial(442.848 ns)

On v1.1.1.

staticfloat · 2019-05-30T16:00:59Z

The let version that @raminammour posted does not calculate the same thing; it never stores the sum value:

julia> func(2.0)
338350.0

julia> func_threaded(2.0)
338350.0

julia> func_threaded_let(2.0)
0.0

If you look at the @code_native of func_threaded_let() versus func_threaded(), you will see that the _let() variant contains less than half as much code; the optimizer is able to get rid of a lot of work because you're not using the output in any way by creating a new binding within the @threads block.

raminammour · 2019-05-30T16:14:27Z

Sorry, my bad, here is a Ref version that does the right thing, and is fast as it avoids #15276. (I suspect the time would have been much faster if the whole calculation in the let version was avoided). I may still be confused though :)


function func(val)
     sum = 0*(1 .^ val)
    for idx in 1:100
        sum += idx.^val
    end
    return sum
end

function func_threaded_ref(val)
     sum = Ref(0*(1 .^ val))
    @threads for idx in 1:100
        sum[] += idx.^val
    end
    return sum[]
end
function func_threaded(val)
     sum = 0*(1 .^ val)
    @threads for idx in 1:100
        sum += idx.^val
    end
    return sum
end

@btime func(2.0)
@btime func_threaded(2.0)
@btime func_threaded_ref(2.0);
versioninfo()

  2.520 μs (0 allocations: 0 bytes)
  4.623 μs (203 allocations: 3.20 KiB)
  2.440 μs (2 allocations: 64 bytes)
Julia Version 1.0.2
Commit d789231e99 (2018-11-08 20:11 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

staticfloat · 2019-05-30T16:19:43Z

Yes, that version does indeed remove some of the leftover performance gap; I'll include that in my benchmark above, but notice that you're still getting MUCH slower timings for even the serial case because you're using Julia 1.0; use Julia 1.1 or even better master and you'll see a large difference in performance. Your Ref change doesn't quite eliminate the speed difference, although it does get it closer:

Before this PR:

#= /Users/sabae/src/julia/speedtest.jl:27 =# @benchmark(func(2.0)) = Trial(81.144 ns)
#= /Users/sabae/src/julia/speedtest.jl:28 =# @benchmark(func_threaded(2.0)) = Trial(8.739 μs)
#= /Users/sabae/src/julia/speedtest.jl:29 =# @benchmark(func_threaded_ref(2.0)) = Trial(6.890 μs)

After this PR:

#= /Users/sabae/src/julia/speedtest.jl:27 =# @benchmark(func(2.0)) = Trial(81.176 ns)
#= /Users/sabae/src/julia/speedtest.jl:28 =# @benchmark(func_threaded(2.0)) = Trial(3.775 μs)
#= /Users/sabae/src/julia/speedtest.jl:29 =# @benchmark(func_threaded_ref(2.0)) = Trial(1.674 μs)

staticfloat · 2019-06-03T06:23:04Z

@JeffBezanson, @yuyichao any further comments? I'm unsure why @threads with a single thread retains such a slowdown (I assume because of inference goblins) but the speedup here is not insignificant on its own.

staticfloat · 2019-06-19T06:41:07Z

Pinging @JeffBezanson and @yuyichao again to see if there are any further comments, if not I think we should merge this, as it's a straight performance win when using @threads with only one thread.

yuyichao · 2019-06-19T14:11:36Z

How much speed up you get if you use https://github.com/JuliaLang/julia/pull/21452/files#diff-7198cded2577e0bdeb563f0f2713347bR69 instead?

Also, that branch is somehow changed to use the latest world in #30838, which is presumably to be consistent with the threading branch. If that's the case and is intended, then this should do that as well which means that it cannot be compiled / inlined ahead of time.

The threading branch is changed to use the latest world in https://github.com/JuliaLang/julia/pull/31398/files#diff-5a6699a5aa7cf07be50461e3c7f68262L693 and in particular the code was added in d9d8d4c. Why is that? That seems to be fairly different from the semantics before and the semantics of a normal for loop. Is that needed for something in particular? Otherwise, I don't really see the point of making this change.

Another reason calling the function is preferred is to make sure the semantics of the loop is actually the same. This means that the single threaded @thread loop should have the same scope rules (in a function) and the same limitations as a normal one. This way you can actually test it with a single thread and be fairly sure that the code could run with multiple threads (short of other true thread related bugs).

Also, is the nested thread loop hack still needed? Isn't the point of partr to get rid of it? (That was what I had in mind when adding that code anyway...)

yuyichao · 2019-06-19T14:15:50Z

@vtjnash ^^^

vtjnash · 2019-06-19T16:00:22Z

Yeah, the jl_threading_run function now is just a statically compiled copy of some Julia code. That's pretty slow (and awkward / hard to maintain), so we should move that function into Julia

vchuravy · 2019-06-19T16:03:44Z

I agree that we shouldn't performance hack the nthreads=1 case, I have found it pretty important to debug issues early on by being able to say: Is this threaded loop slow on a single thread? What happens if I remove it.

staticfloat · 2019-06-19T17:23:12Z

How much speed up you get if you use /pull/21452/files#diff-7198cded2577e0bdeb563f0f2713347bR69 instead?

I think you're asking what happens if I remove the invokelatest; the answer is that there is essentially no difference; it's lost in measurement noise. Nothing like the speedup in this PR.

The latest benchmarks:

master:

#= /home/sabae/src/julia/thread_test.jl:28 =# @benchmark(func(2.0)) = Trial(96.508 ns)
#= /home/sabae/src/julia/thread_test.jl:29 =# @benchmark(func_threaded(2.0)) = Trial(5.926 μs)
#= /home/sabae/src/julia/thread_test.jl:30 =# @benchmark(func_threaded_ref(2.0)) = Trial(4.131 μs)

sf/threads_fasttrack:

#= /home/sabae/src/julia/thread_test.jl:28 =# @benchmark(func(2.0)) = Trial(93.319 ns)
#= /home/sabae/src/julia/thread_test.jl:29 =# @benchmark(func_threaded(2.0)) = Trial(2.199 μs)
#= /home/sabae/src/julia/thread_test.jl:30 =# @benchmark(func_threaded_ref(2.0)) = Trial(523.874 ns)

I agree that we shouldn't performance hack the tid=1 case, I have found it pretty important to debug issues early on by being able to say: Is this threaded loop slow on a single thread? What happens if I remove it.

I think this is a very "internals perspective" viewpoint; you are using your deep knowledge of the compiler and its idiosyncrasies to debug code that interacts with these hidden parts of the compiler, but from a user perspective, imposing a 40x slowdown (this is dodging the Ref/capturing issues, without dodging those it's an even worse slowdown) is pretty unacceptable.

This performance slowdown is the single reason why NNlib had all of its @threads removed; because when running single-threaded (which is the default for Julia) our networks run significantly faster than they otherwise would; it's not worth punishing the single-threaded users with a slow NNlib in order for the multi-threaded users to get a multicore speedup. It might be if the overhead were something like 2x or 3x for these small loops, but 40x is just too much of a slowdown. (In NNlib we of course have loops with more work inside them that we are measuring, but the overhead is still far too large)

This means that the single threaded @thread loop should have the same scope rules (in a function) and the same limitations as a normal one. This way you can actually test it with a single thread and be fairly sure that the code could run with multiple threads (short of other true thread related bugs).

I've updated this PR to more closely match the scope semantics of the other logic branches.

yuyichao · 2019-06-19T17:58:21Z

I think you're asking what happens if I remove the invokelatest; the answer is that there is essentially no difference; it's lost in measurement noise. Nothing like the speedup in this PR.

No, I mean if you remove the invokelatest and use that branch.

yuyichao · 2019-06-19T18:01:16Z

I've updated this PR to more closely match the scope semantics of the other logic branches.

It seems that it still doesn't have the same limitation as the other branches.

All what I'm saying is that you should just use the invokelatest branch. I doubt the performance of that will be good enough as is but it's the only way you get identical semantics.
Then there's the question of whether the invokkelatest is needed, which is what I'm asking @vtjnash about. If it is, then there's no way you can get better performance. If it isn't, then both the julia code and the C code should switch away from it.

staticfloat · 2019-06-19T20:34:30Z

That's a good idea @yuyichao; I didn't quite realize that the nested-threading case was essentially the same as my "single-thread" case. Changing this to just use that branch when nthreads() == 1 gets the same level of performance. Eliminating invokelatest() does have an impact, but not much: It lowers the func_threaded_ref() time from 550 ns to 510ns, so a time difference of ~8%.

yuyichao · 2019-06-19T20:38:37Z

In that case this looks good enough as is and the necessity of the invokelatest is just a separate issue.

vchuravy · 2019-06-19T22:11:39Z

I think this is a very "internals perspective" viewpoint; you are using your deep knowledge of the compiler and its idiosyncrasies to debug code that interacts with these hidden parts of the compiler, but from a user perspective, imposing a 40x slowdown (this is dodging the Ref/capturing issues, without dodging those it's an even worse slowdown) is pretty unacceptable.

I think it is exactly the other way around. I as someone with an internals background and sufficient experience can guess at why a @threads loop is slow when nthreads > 1.
How is someone without the knowledge supposed to figure out that they have a performance bottleneck? Also it is not a 40x slowdown since this is a constant start-up cost.

function func_threaded_ref(val, N)
           sum = Ref(0*(1 .^ val))
           @threads for idx in 1:N
               sum[] += idx.^val
           end
           return sum[]
       end

function func_ref(val, N)
           sum = Ref(0*(1 .^ val))
           for idx in 1:N
               sum[] += idx.^val
           end
           return sum[]
       end

julia> @btime func_ref(1, 1)
  1.268 ns (0 allocations: 0 bytes)
1

julia> @btime func_threaded_ref(1, 1)
  4.096 μs (10 allocations: 848 bytes)
1

julia> @btime func_threaded_ref(1, 10000)
  20.275 μs (10 allocations: 848 bytes)
50005000

julia> @btime func_ref(1, 10000)
  21.107 μs (0 allocations: 0 bytes)
50005000

Oh no! 4000x slow-down. For me one of the strong suites of Julia is performance predictability, and I feel that my micro-optimizing this case we make the performance model of a threaded loop (that there is a startup-cost to pay) more opaque and less user-friendly.

chethega · 2019-07-05T17:45:43Z

This avoids overhead when threading is disabled, Example benchmark:

FWIW, this is not a good example, since for nthreads>1 it (a) produces wrong results and (b) is slow. Both are for the same reason: All threads try to read and write to the same memory location concurrently. This gives unpredictable (i.e. wrong) results and also makes your poor CPU weep when trying to synchronize caches between cores.

Single thread:

julia> @btime func_ref(1, 1<<20)
  3.876 ms (0 allocations: 0 bytes)
549756338176

julia> @btime func_threaded_ref(1, 1<<20)
  3.887 ms (10 allocations: 848 bytes)
549756338176

Two threads:

julia> @btime func_ref(1, 1<<20)
  3.876 ms (0 allocations: 0 bytes)
549756338176

julia> @btime func_threaded_ref(1, 1<<20)
  7.546 ms (17 allocations: 1.56 KiB)
239044531651

julia> @btime func_threaded_ref(1, 1<<20)
  7.531 ms (17 allocations: 1.56 KiB)
137439215616

Do we have an actually correct (deterministic) example with significant (not O(1)) slowdown due to @threads?

staticfloat · 2019-07-06T01:37:59Z

@chethega Sure, let's push the example farther toward reality. I've updated the benchmarks at the top to (a) have a slightly larger workload (1024 items) (b) actually compute something correctly no matter how many threads are assigned to it, and (c) remove the Ref workaround for #15276 since it's not needed anymore, as I'm storing things within the sums array.

How is someone without the knowledge supposed to figure out that they have a performance bottleneck?

I think when someone is chasing performance, it's okay to expect them to do a little reading. We should not avoid fast paths just because a slow path exists; ideally we would simply have extremely fast work division code and the 4us of constant overhead would not exist, unfortunately, it does.

Also it is not a 40x slowdown since this is a constant start-up cost. ..... Oh no! 4000x slow-down.

I take your point ;) and I should not have used that kind of comparison when I'm explicitly talking about very small problem sizes. These arise in things like NNlib where it's equally likely that I'm running a loop over 10M elements as it is I'm running a loop over 10. In general, I agree that a better approach would be to have a way to have @threads conditionally execute based on problem size, but since this is such an obvious quick performance win, (with a 1-line diff that changes no semantics) I don't see why it's controversial.

@threads

This avoids overhead when threading is disabled, Example benchmark: ``` using BenchmarkTools, Base.Threads, Test function func(val, N) sums = [0*(1 .^ val) for thread_idx in 1:nthreads()] for idx in 1:N sums[threadid()] += idx.^val end return sum(sums) end function func_threaded(val, N) sums = [0*(1 .^ val) for thread_idx in 1:nthreads()] @threads for idx in 1:N sums[threadid()] += idx.^val end return sum(sums) end @test func(2.0, 1<<10) == func_threaded(2.0, 1<<10) @show @benchmark func(2.0, 1<<10) @show @benchmark func_threaded(2.0, 1<<10) ``` Running the benchmarks as: ``` for JULIA in julia-master ./julia; do for T in 1 2; do echo "$JULIA with $T threads:" JULIA_NUM_THREADS=$T $JULIA speedtest.jl done done ``` Before this commit: ``` julia-master with 1 threads: julia-master with 2 threads: ``` After this commit: ``` ./julia with 1 threads: ./julia with 2 threads: ```

JeffBezanson · 2019-07-06T02:24:57Z

Try #32477?

chethega · 2019-07-06T10:23:56Z

but since this is such an obvious quick performance win, (with a 1-line diff that changes no semantics) I don't see why it's controversial.

Fair enough. Even with #32477, I see no objection with the current variant. I think @yuyichao's comment referred to a previous version that you forced-pushed away.

vchuravy · 2019-07-06T15:17:28Z

In general, I agree that a better approach would be to have a way to have @threads conditionally execute based on problem size, but since this is such an obvious quick performance win, (with a 1-line diff that changes no semantics) I don't see why it's controversial.

I briefly thought that would be a good idea, but you can't make that judgement as part of the macro since my problem size of 4 might be as work intensive as your problem of size 10k.

I think when someone is chasing performance, it's okay to expect them to do a little reading. We should not avoid fast paths just because a slow path exists;

When I originally fixed #24688 it had taken a year and a half since we originally noticed some weirdness going on to use to nail down the issue. If we had simply shortcutted the semantic behaviour of @threads, e.g. outline this thunk, we would have simply shurreg an been like: "threading performance is bad, we just need more cores". Nowadays the awareness for that issue is much higher, but that is kinda besides the point.

It looks like #32477 brings down the overhead to 1us instead of 4us? From my perspective adding @threads should not be a free action since it drastically changes the semantics of the code and you will have to rewrite the surrounding code to have the right semantics.

I can probably live with having a trigger/switch in the macro that enables the old behaviour. Then at least I can tell people, is your code with @threads force=true and nthreads()==1 still slow?

staticfloat · 2019-07-06T23:00:09Z

Even with #32477, I see no objection with the current variant.

With #32477 this optimization doesn't apply anymore; Jeff has removed the branch. I am content with only 1us of overhead; that is below my arbitrary threshold of performance anxiety. :)

jebej · 2019-07-23T20:16:35Z

Not sure where to put this, but I wanted to try the last bench functions, and surprisingly do not get a speedup in the multi-threaded case (with 4 threads, on a i7-3770K, on Windows 7).

Julia 1.1.1

julia> @benchmark func(2.0, 1<<10)
BenchmarkTools.Trial:
  memory estimate:  112 bytes
  allocs estimate:  1
  --------------
  minimum time:     7.161 μs (0.00% GC)
  median time:      7.234 μs (0.00% GC)
  mean time:        7.448 μs (0.00% GC)
  maximum time:     31.932 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     4

julia> @benchmark func_threaded(2.0, 1<<10)
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  2
  --------------
  minimum time:     7.307 μs (0.00% GC)
  median time:      14.321 μs (0.00% GC)
  mean time:        15.243 μs (0.00% GC)
  maximum time:     7.044 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

I also tried this on the new alpha; allocations are much more important there:

Julia 1.3.0 alpha

julia> @benchmark func(2.0, 1<<10)
BenchmarkTools.Trial:
  memory estimate:  112 bytes
  allocs estimate:  1
  --------------
  minimum time:     7.161 μs (0.00% GC)
  median time:      7.234 μs (0.00% GC)
  mean time:        7.470 μs (0.00% GC)
  maximum time:     19.656 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     4

julia> @benchmark func_threaded(2.0, 1<<10)
BenchmarkTools.Trial:
  memory estimate:  3.56 KiB
  allocs estimate:  30
  --------------
  minimum time:     11.399 μs (0.00% GC)
  median time:      18.413 μs (0.00% GC)
  mean time:        18.701 μs (0.00% GC)
  maximum time:     838.554 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

staticfloat requested a review from JeffBezanson May 29, 2019 21:43

staticfloat added domain:multithreading Base.Threads and related functionality performance Must go faster labels May 29, 2019

staticfloat force-pushed the sf/threads_fasttrack branch from 26b7cdc to 3cbd0c7 Compare May 29, 2019 23:34

staticfloat force-pushed the sf/threads_fasttrack branch from 3cbd0c7 to eb2de62 Compare June 19, 2019 05:10

staticfloat force-pushed the sf/threads_fasttrack branch from eb2de62 to f047af7 Compare June 19, 2019 17:24

staticfloat force-pushed the sf/threads_fasttrack branch from f047af7 to 78eb422 Compare June 19, 2019 20:29

staticfloat force-pushed the sf/threads_fasttrack branch from 78eb422 to baa94c8 Compare July 6, 2019 01:40

staticfloat mentioned this pull request Jul 6, 2019

remove threadedregion and move jl_threading_run to julia #32477

Closed

staticfloat closed this Jul 6, 2019

vchuravy deleted the sf/threads_fasttrack branch July 24, 2019 18:48

yuyichao mentioned this pull request Nov 27, 2019

Make the at-threads a no-op if nthreads()==0 #33964

Closed

JeffBezanson mentioned this pull request Apr 28, 2020

rewrite jl_threading_run in julia #35632

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast-track `@threads` when `nthreads() == 1` #32181

Fast-track `@threads` when `nthreads() == 1` #32181

staticfloat commented May 29, 2019 •

edited

Loading

yuyichao commented May 29, 2019

raminammour commented May 29, 2019

staticfloat commented May 29, 2019

raminammour commented May 30, 2019

jebej commented May 30, 2019

staticfloat commented May 30, 2019

raminammour commented May 30, 2019

staticfloat commented May 30, 2019 •

edited

Loading

staticfloat commented Jun 3, 2019

staticfloat commented Jun 19, 2019

yuyichao commented Jun 19, 2019

yuyichao commented Jun 19, 2019

vtjnash commented Jun 19, 2019

vchuravy commented Jun 19, 2019 •

edited

Loading

staticfloat commented Jun 19, 2019

yuyichao commented Jun 19, 2019

yuyichao commented Jun 19, 2019

staticfloat commented Jun 19, 2019 •

edited

Loading

yuyichao commented Jun 19, 2019

vchuravy commented Jun 19, 2019

chethega commented Jul 5, 2019

staticfloat commented Jul 6, 2019

JeffBezanson commented Jul 6, 2019

chethega commented Jul 6, 2019

vchuravy commented Jul 6, 2019

staticfloat commented Jul 6, 2019

jebej commented Jul 23, 2019

Fast-track @threads when nthreads() == 1 #32181

Fast-track @threads when nthreads() == 1 #32181

Conversation

staticfloat commented May 29, 2019 • edited Loading

yuyichao commented May 29, 2019

raminammour commented May 29, 2019

staticfloat commented May 29, 2019

raminammour commented May 30, 2019

jebej commented May 30, 2019

staticfloat commented May 30, 2019

raminammour commented May 30, 2019

staticfloat commented May 30, 2019 • edited Loading

staticfloat commented Jun 3, 2019

staticfloat commented Jun 19, 2019

yuyichao commented Jun 19, 2019

yuyichao commented Jun 19, 2019

vtjnash commented Jun 19, 2019

vchuravy commented Jun 19, 2019 • edited Loading

staticfloat commented Jun 19, 2019

yuyichao commented Jun 19, 2019

yuyichao commented Jun 19, 2019

staticfloat commented Jun 19, 2019 • edited Loading

yuyichao commented Jun 19, 2019

vchuravy commented Jun 19, 2019

chethega commented Jul 5, 2019

staticfloat commented Jul 6, 2019

JeffBezanson commented Jul 6, 2019

chethega commented Jul 6, 2019

vchuravy commented Jul 6, 2019

staticfloat commented Jul 6, 2019

jebej commented Jul 23, 2019

Fast-track `@threads` when `nthreads() == 1` #32181

Fast-track `@threads` when `nthreads() == 1` #32181

staticfloat commented May 29, 2019 •

edited

Loading

staticfloat commented May 30, 2019 •

edited

Loading

vchuravy commented Jun 19, 2019 •

edited

Loading

staticfloat commented Jun 19, 2019 •

edited

Loading