Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: fix slowness of randn introduced by 84b6aec6 #8941

Merged
merged 1 commit into from
Nov 10, 2014

Conversation

ViralBShah
Copy link
Member

Quoting @rfourquet:

@ViralBShah Great! I remember having tested inlined rand() at some point (before #8854 I think, and without other @inlines of this commit) but noticing degraded performances somewhere else (I think for the array version rand(n::Int)). What I find now is that this commit slows down randn[!] methods by up to 50%, do you confirm?

I just pushed a branch where randmtzig_randn uses a non-inlined version of rand which seem to restore previous perfs of randn, but this starts to make me feel uneasy with all those @inline tweaks. Should I open a PR? (In this branch, I also revert/modify part of this commit because I think rand_close_open is dSFMT specific so I prefer to keep rand(r::AbstractRNG) = rand(r, Float64) which I find more self-documenting, do you agree?).

@ViralBShah
Copy link
Member Author

Do you think that if we simplify the fast code path in randn and @inline that while moving all the slow code paths to a different function, we will get good performance for randn? I was trying that, but didn't quite finish the exploration.

Cc: @andreasnoack @vtjnash for the inlining sensitivity.

@ViralBShah ViralBShah added the performance Must go faster label Nov 8, 2014
@vtjnash
Copy link
Member

vtjnash commented Nov 8, 2014

I don't know of any reason that inlining should be making something slower. Although, in general, I don't recommend scattering @inbounds everywhere (it shouldn't be needed)

@rfourquet
Copy link
Member

I will try to see if randn performances can be improved when I get back to my fast computer within 2 weeks.

@ViralBShah
Copy link
Member Author

BTW, I am comparing the randn performance on master vs. 0.3.2, and I am only seeing at most a 10% difference.

@rfourquet
Copy link
Member

I compared master vs. master with 84b6aec reverted (or this PR). On my machine, the scalar (resp. array) version is about 55% (resp. 35%) slower on master, tested with those functions:

function scalN(n)
    m = MersenneTwister()
    tic()
    for i in 1:n
        randn(m)
    end
    toc()
end

function arrN(n, r=1)
    a = Array(Float64, n)
    m = MersenneTwister()
    tic()
    for i in 1:r
        randn!(m, a)
    end
    toc()
end

With the equivalent functions working with the global RNG, the timings are slightly different but of the same order.

@rfourquet
Copy link
Member

While preparing another PR, I found another fix for the slowdown: get the random values from the MersenneTwister.vals field starting from the end of the array. If this fix is prefered, I can include it in this upcoming PR (note that this implies that many hardcoded values in test/random.jl have to be modified accordingly).

@ViralBShah
Copy link
Member Author

Could you elaborate a little bit on what this means? I don't fully understand the downsides of reading from end of array, and why it may be faster. I had only tested the global RNG, where I found timing to be the same, which is the most common case. Good to know that we both have the same results there.

Focussing on the rand(m) case now, it would be good to understand why the inlining is slow and the non-inlining version is faster. @vtjnash can perhaps help us here. Also, let's remove as many of the @inbounds and @inlines as we can to get it to the minimal, before we make more changes.

Modifying hardcoded values in test/random.jl is perfectly ok.

@ViralBShah ViralBShah changed the title fix slowness of randn introduced by 84b6aec6 WIP: fix slowness of randn introduced by 84b6aec6 Nov 10, 2014
@rfourquet
Copy link
Member

Oh sorry I was unclear: the timings using the following functions are almost the same than with those I posted above using explicitly an RNG, which means that even with the global RNG there is a slowdown of about 30% and 50% (array/scalar versions) fixed by this PR.

function scalNG(n)
    tic()
    for i in 1:n
        randn()
    end
    toc()
end

function arrNG(n, r=1)
    a = Array(Float64, n)
    tic()
    for i in 1:r
        randn!(a)
    end
    toc()
end

WRT the other fix I mentioned, I really don't understand what happens, and it is a very fragile solution too. Also, I found that this fix could decrease slightly performances of other functions. I'will post later a small patch which can be discussed.

@rfourquet
Copy link
Member

The sus-mentioned patch is against https://github.com/JuliaLang/julia/blob/rf/rand-fast-array/base/random.jl#L29, by changing all mt_* functions:

@inline mt_avail(r::MersenneTwister) = r.idx
@inline mt_empty(r::MersenneTwister) = r.idx == 0
@inline mt_setfull!(r::MersenneTwister) = r.idx = length(r.vals)
@inline mt_setempty!(r::MersenneTwister) = r.idx = 0
@inline mt_pop!(r::MersenneTwister) = (@inbounds f = r.vals[r.idx]; r.idx -= 1; f)

ViralBShah added a commit that referenced this pull request Nov 10, 2014
WIP: fix slowness of randn introduced by 84b6aec
@ViralBShah ViralBShah merged commit 4b4655e into master Nov 10, 2014
@ViralBShah
Copy link
Member Author

I still don't see a change pre/post this PR in randn() or randn(m) for either scalar/array case. Could you post what timings you are seeing in all the cases?

@ViralBShah
Copy link
Member Author

@rfourquet I just reverted this one as I was unable to see better performance after merging it. Let's verify this issue again after the other open PRs have been merged.

@rfourquet
Copy link
Member

The timings I get are, for fun(n) = (scalN(n); scalNG(n); arrN(n); arrNG(n) ); fun(10^7):
On 3 days old master (64aa84d):

elapsed time: 1.153464712 seconds
elapsed time: 1.156994426 seconds
elapsed time: 1.293826568 seconds
elapsed time: 1.369713691 seconds

With this PR:

elapsed time: 0.733746595 seconds
elapsed time: 0.738688603 seconds
elapsed time: 0.927415479 seconds
elapsed time: 1.017022933 seconds 

@ViralBShah
Copy link
Member Author

With the current master on my 2.4GHz Core i5 mac, I get numbers that are orders of magnitude faster than yours.

julia> fun(n) = (scalN(n); scalNG(n); arrN(n); arrNG(n) ); fun(10^7)
elapsed time: 0.188870645 seconds
elapsed time: 0.183882291 seconds
elapsed time: 0.218336373 seconds
elapsed time: 0.216428959 seconds

@rfourquet
Copy link
Member

Yes my laptop is so slow (Core 2 Duo, 1.20GHz)... But what are your numbers with this PR (12420d2)?

@ViralBShah
Copy link
Member Author

With 12420d2 I get the following:

julia> fun(n) = (scalN(n); scalNG(n); arrN(n); arrNG(n) ); fun(10^7)
elapsed time: 0.170618011 seconds
elapsed time: 0.172811678 seconds
elapsed time: 0.179000492 seconds
elapsed time: 0.204780084 seconds

@rfourquet
Copy link
Member

OK, so this PR does not seem particularly useful in this case. Do you think the larger speed differences I observe could be due to a smaller cache on my computer?

@ViralBShah
Copy link
Member Author

Can't really tell. I think we can keep the codebase clean for now, and keep an eye out for other regressions.

@ViralBShah ViralBShah added the randomness Random number generation and the Random stdlib label Nov 22, 2014
rfourquet added a commit that referenced this pull request Nov 24, 2014
All credits to @ViralBShah (cf. #8941 and #9126).
This change probably allows better inlining.
rfourquet added a commit that referenced this pull request Nov 24, 2014
All credits to @ViralBShah (cf. #8941 and #9126).
This change probably allows better inlining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants