-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrap range variable in a let block for at_threads #24688
Conversation
Working around #15276
The newly introduced "let" makes my threaded code 2 times slower between 0.6.2 and 0.6.3, even when running with JULIA_NUM_THREADS=1 .. I have trouble debugging what happens here: code_warntype shows me nothing weird, and profiling my code in 0.6.2 versus 0.6.3 doesn't show a change in hotspots. How can I find out what's happening? Just for my sanity I reverted this change in 0.6.3 and then the performance is as it was. Maybe also useful to note is that if I remove at_threads from my loop altogether, I get the performance in 0.6.3 I would expect from a single-threaded loop (which is the same as in 0.6.2 with at_threads and JULIA_NUM_THREADS=1). |
Can you post a MWE? The problem is that code_warntype doesn't see the
closure used for the thread function.
…On Sun, Jun 3, 2018, 14:46 Cees-Bart Breunesse ***@***.***> wrote:
The newly introduced "let" makes my threaded code 2 times slower between
0.6.2 and 0.6.3, even when running with JULIA_NUM_THREADS=1 .. I have
trouble debugging what happens here: code_warntype shows me nothing weird,
and profiling my code in 0.6.2 versus 0.6.3 doesn't show a change in
hotspots. How can I find out what's happening?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#24688 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAI3akdoBn2NSJhvCI-o8uGKqTZoCkQVks5t5FkqgaJpZM4QmJmv>
.
|
I'm struggling to minimize it, but I attached something that runs by itself: standalone.zip. Timings on my laptop: For 0.6.3: For 0.6.2: I used at_code_warntype on threadsfor_fun and indeed it looks much cleaner with the let-block in 0.6.3, I just don't see what ruins the performance (in my experience this usually means something is ruining the cache). Any pointers are appreciated. |
Do you have the same performance drop if you just remove |
Without Without |
OK .. I think I found it .. It looks like that without the let in Julia 0.6.2, because of all the Any types in threadsfor_fun, my state variable in the threaded loop was passed as a pointer (which is good):
But due to the let and the type propagation, it looks like somewhere a copy is made of state, screwing up the cache friendliness of my code. I fixed it by wrapping state into a Ref:
|
This works around performance issues I have been seeing due to #15276.
In particular this function showed no performance scaling when i recently introduced threading to students:
These numbers are for single threaded performance and where
my_sum
is simplethreaded_sum
without@threads
.Adding
code_warntype(STDOUT, threadsfor_fun, (Bool,))
in linejulia/base/threadingconstructs.jl
Line 65 in 52d81b0
Shows that the example function which is type-stable without
@threads
becomes https://gist.github.com/vchuravy/e60e0e63b79de59f92c95a5864c66f37.The change in this PR introduces a let block for the range variable eliminating almost all type-instability https://gist.github.com/vchuravy/0a4203674d02c92afee9b10b58756bbe and drastically improves performance: