-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: fix slowness of randn introduced by 84b6aec6 #8941
Conversation
Do you think that if we simplify the fast code path in Cc: @andreasnoack @vtjnash for the inlining sensitivity. |
d337b38
to
12420d2
Compare
I don't know of any reason that inlining should be making something slower. Although, in general, I don't recommend scattering |
I will try to see if |
BTW, I am comparing the |
I compared master vs. master with 84b6aec reverted (or this PR). On my machine, the scalar (resp. array) version is about 55% (resp. 35%) slower on master, tested with those functions:
With the equivalent functions working with the global RNG, the timings are slightly different but of the same order. |
While preparing another PR, I found another fix for the slowdown: get the random values from the |
Could you elaborate a little bit on what this means? I don't fully understand the downsides of reading from end of array, and why it may be faster. I had only tested the global RNG, where I found timing to be the same, which is the most common case. Good to know that we both have the same results there. Focussing on the rand(m) case now, it would be good to understand why the inlining is slow and the non-inlining version is faster. @vtjnash can perhaps help us here. Also, let's remove as many of the Modifying hardcoded values in test/random.jl is perfectly ok. |
Oh sorry I was unclear: the timings using the following functions are almost the same than with those I posted above using explicitly an RNG, which means that even with the global RNG there is a slowdown of about 30% and 50% (array/scalar versions) fixed by this PR.
WRT the other fix I mentioned, I really don't understand what happens, and it is a very fragile solution too. Also, I found that this fix could decrease slightly performances of other functions. I'will post later a small patch which can be discussed. |
The sus-mentioned patch is against https://github.com/JuliaLang/julia/blob/rf/rand-fast-array/base/random.jl#L29, by changing all
|
WIP: fix slowness of randn introduced by 84b6aec
I still don't see a change pre/post this PR in |
@rfourquet I just reverted this one as I was unable to see better performance after merging it. Let's verify this issue again after the other open PRs have been merged. |
The timings I get are, for
With this PR:
|
With the current
|
Yes my laptop is so slow (Core 2 Duo, 1.20GHz)... But what are your numbers with this PR (12420d2)? |
With 12420d2 I get the following:
|
OK, so this PR does not seem particularly useful in this case. Do you think the larger speed differences I observe could be due to a smaller cache on my computer? |
Can't really tell. I think we can keep the codebase clean for now, and keep an eye out for other regressions. |
All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.
All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.
Quoting @rfourquet:
@ViralBShah Great! I remember having tested inlined rand() at some point (before #8854 I think, and without other
@inlines
of this commit) but noticing degraded performances somewhere else (I think for the array version rand(n::Int)). What I find now is that this commit slows down randn[!] methods by up to 50%, do you confirm?I just pushed a branch where randmtzig_randn uses a non-inlined version of rand which seem to restore previous perfs of randn, but this starts to make me feel uneasy with all those
@inline
tweaks. Should I open a PR? (In this branch, I also revert/modify part of this commit because I think rand_close_open is dSFMT specific so I prefer to keep rand(r::AbstractRNG) = rand(r, Float64) which I find more self-documenting, do you agree?).