rand of AbstractArray #8649

rfourquet · 2014-10-10T20:25:13Z

This undoes revert of #8309, and re-enables rand of long ranges, which was impossible after #8309 (hence the revert).
In this new PR, rand(typemin(Int):typemax(Int)) remains valid, but there are new possibilities, like e.g. rand(typemin(Int):1:typemax(Int)).

It also seems that rand is now slightly faster in some cases (see for example small benchmark in #8563, i.e. @time for i = 1:10000000; rand(1:4); end).

The reason I didn't implement this in #8309 is that I don't think I'm the qualified person to introduce the needed functionalities, which is here provided by cendof and cget: these work like endof and getindex respectively but with 0-based index (like in C, hence the name; I found the name cgetindex too ugly as it has to be used in code, unlike getindex which has a nice sugar syntax).
The idea is that currently a range like r=typemin(Int):typemax(Int) is such that r[endof(r)] raises an OverflowError. So we need a (lower-level) way to request the element at any position of such ranges. With the new functions, cget(r, 0) == typemin(Int) and cget(r, cendof(r)) == typemax(Int) with cendof(r) == typemax(Uint).
I expect this PR to be rejected but hope that in this case someone will implement alternative functions.

cc @ivarne (BTW, I removed the T type paramater, cf. your comment #8309 (comment)).

This reverts commit 2abb59b. This reverts commit 6117328.

ivarne · 2014-10-11T05:32:36Z

base/abstractarray.jl

@@ -374,6 +375,8 @@ imag{T<:Real}(x::AbstractArray{T}) = zero(x)

 getindex(t::AbstractArray, i::Real) = error("indexing not defined for ", typeof(t))

+cget(t::AbstractArray, i::Real) = @inbounds return t[i+1]


You'll not get this use of @inbounds past Jeff. #8227 is still open about being able to propagate @inbounds declarations trough functions in a meaningful way.

Thanks, but then I'm not sure what to do: rename this method to cget_unsafe instead? (in my mind, the "c" in "cget" was already a warning, and this is meant for internal use only). Or remove @inbounds and hope that there is no/little overhead?

This is such a simple definition, I'm not sure what benefit there is to even having it if its strictly for internal use. Similarly, cendof just seems like a pointless private definition mixed in with a bunch of things that are public.

The functions cendof and cget are needed only for their specializations for ranges (where the definition is slightly less trivial). In order for this PR's extended rand functionality to work with ranges like rand(typemin(Int):typemax(Int)), there needs to be a way to index into every position of such a range (which is not particularly specific to rand). Implementing cget/cendof was the simplest solution I found, but I humbly request for help to improve the design of these functions or to find alternatives.

Stefan has a good point. In fact, cget is only used in two places, so it looks like it would be just as easy to specialize those cases for Ranges rather than introduce cget and cendof. At the very least I would move all the code for cget to random.jl, since I certainly hope we won't see increasing use/need of it.

It is better to start in the Random module in random.jl and move to Base in abstractarray.jl if we need it in more places. range.jl is also a possible place, as the problem is really that we want UnitRange to support a length of typemax(Uint) + 1, and other uses of UnitRange might also need to support full ranges.

How about the names length_0 and getindex_0 instead?

I don't think length_0 fits, as whatever the first indice is, the length remains the same, unlike the last indince; I could rename to endof_0 and getindex_0 (I first thougth of length_minus_1 or even length_1 , taking "_" as meaning "minus", but "end of" describes better what is needed).

I can certainly move the code for cget/cendof in random.jl. I didn't do that at first as rand has nothing to do with these implementations (rand needs it only not to break previously working functionality), and these could be used elsewhere also (e.g. sum(1:typemax(Uint)) works but not sum(0:typemax(Uint)) which should give the same answer, but I admit it's not a very convincing example). However I'd rather not have rand have a specialization for ranges.

Conflicts: test/ranges.jl

rfourquet · 2014-10-16T15:55:57Z

Updated PR: cget/cendof moved to random.jl and renamed to getindex_0/endof_0. If this gets to be merged, I will rebase/squash those new commits.
Should the tests for those functions be moved to test/random.jl (instead of range.jl) or be simply deleted?

ViralBShah · 2014-11-06T19:53:51Z

It would be nice to have rand on long ranges as a separate PR. Given all the work done recently on random, I think a fresh PR would be a good idea, and perhaps the endof_0 stuff can be a later PR after the basic stuff is in place.

Closing this one for now.

rfourquet · 2014-11-08T15:44:32Z

@ViralBShah I'm not sure to understand what you mean: the fact that rand on long ranges was not supported anymore is the reason why #8309 was reverted, and the endof_0 stuff is the mecanism by which they were re-enabled here.
So is your suggestion to make a PR adding rand of AbstractArray, keeping the existing implementation for ranges, and later having a PR which makes rand of ranges using the general AbstractArray version (introducing endof_0/getindex_0 to make rand on long ranges still work) ?

ViralBShah · 2014-11-08T15:51:14Z

Actually, I need to fully understand what's going on here, and I did not originally understand the PR. I'd love to see what some others have to say on this.

rfourquet · 2014-11-08T16:41:20Z

My motivation in this PR was that rand(a::Range) is a particular case of rand(a::AbstractArray): pick randomly an element from a. Unfortunately, the easy way to implement this generalization (#8309) makes rand on ranges like in rand(typemin(Int):typemax(Int)) not work anymore, because taking the length of such ranges triggers an error, so endof_0/getindex_0 were introduced to work-around this limitation.
In the process, RandIntGen API was modified/simplified (cf. also #8309 (comment)).

ViralBShah · 2014-11-12T17:08:50Z

@StefanKarpinski What is a good solution to the long range issue here? Introducing special versions of endof and getindex just doesn't feel right.

length(typemin(Int):typemax(Int)) is an overflow error

r=typemin(Int):typemax(Int)
julia> endof(r)
ERROR: OverflowError()
 in endof at abstractarray.jl:24

ivarne · 2014-11-12T21:38:14Z

As long as nobody wants to use rand(typemax(Int):-1:typemin(Int)), or have a rather weird FloatRange, this is only a potential problem for rand(r::UnitRange). We currently have a unsolved length problem for the full range objects, but we can easily write a special version for rand(r::UnitRange).

rfourquet · 2014-11-18T07:16:51Z

@ivarne : do you mean that it would be OK that rand(typemax(Int):-1:typemin(Int)) continues not working? I understand that rand(r::UnitRange) should be handled specially only because rand(typemin(Int):typemax(Int)) already works and mustn't stop working.

ivarne · 2014-11-19T20:37:55Z

As long as length() of all collections is an Int not a UInt, I consider all Ranges longer than typemax(Int) useless (or at least risky to use in generic code). Generic AbstractArray code is likely to throw OverflowError() when working with such long ranges. If we need some AbstractArray method to work with really long ranges, we should write a specific method for the ultralong type so that we can know that it actually works (and is tested!).

We have seen a specific request for rand(typemin(Int):typemax(Int)) to work, so we should abandon the view of UnitRange as a AbstractArray and write a rand(r::UnitRange) method that is careful about overflow issues. We have not yet seen a convincing use case for for StepRange (for step==-1) or a really strange FloatRange, so I don't think they are important

lindahua · 2014-11-20T00:57:15Z

I agree with @ivarne that making rand(typemax(Int):-1:typemin(Int)) work is not that important in real world. This example is primarily contrived to test whether a function works in extreme cases.

It is good if it works. If it is difficult to make it work or that making it work requires an approach that compromises performance, then I would say just leave it as it is.

lindahua · 2014-11-20T01:00:58Z

Most functions that work with AbstractArray rely on the length method, and therefore they do not work with things like typemax(Int):-1:typemin(Int). I seems it hasn't caused a lot of problems in real practice.

I don't see why rand is so special that it needs to work around with the problem of length throwing an OverflowError().

rfourquet · 2014-11-20T09:39:06Z

Thanks for the feedback @ivarne and @lindahua, so I I'll get rid of endof_0/getindex_0 for now.

This is a rewrite of part of #8309 and #8649. The goal is to be able to implement the BigInt case more efficiently.

This is another approach to the (non-merged) refactoring part of #8309 and #8649. The goal is to reduce to the minimum the different code paths taken by UnitRange and AbstractArray (UnitRange are handled differently so that those with overflowing length still work. In particular two rand! method are merged into one.

This is another approach to the (non-merged) refactoring part of #8309 and #8649. The goal is to reduce to the minimum the different code paths taken by UnitRange and AbstractArray (UnitRange ranges are handled differently so that those with overflowing length still work). In particular two rand! method are merged into one.

This is another approach to the (non-merged) refactoring part of #8309 and #8649. The goal is to reduce to the minimum the different code paths taken by UnitRange and AbstractArray (UnitRange ranges are handled differently so that those with overflowing length still work). In particular two rand! method are merged into one. Previously, RangeGenerator objects could create (scalar of array of) random values in a range a:b, taking care of creating first a random value in 0:b-a and then adding a. Choosing a random value in an AbstractArray A was then using A[rand(1:length(A))]. Here, RangeGenerator is changed to only handle the creation of random values in a range 0:k, and determining the right value of k (length(A)-1 or b-a) and picking the right element using the random value v (A[1+v] or a+v) is left to separate (and minimal) methods. Hence Range and AbstractArray are handled as uniformly as possible, given that we still want to support ranges like typemin(Int):typemax(Int) for which length overflows.

rfourquet added 3 commits October 11, 2014 01:52

add cendof, cget, cset! functions

1d929d7

re-add JuliaLang#8309 (which was reverted)

d52ebff

This reverts commit 2abb59b. This reverts commit 6117328.

re-enable "full-range" rand

b49f843

ivarne reviewed Oct 11, 2014
View reviewed changes

jiahao force-pushed the master branch from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

rfourquet added 3 commits October 16, 2014 16:59

Merge branch 'master' into rand_range

b3b2b64

Conflicts: test/ranges.jl

move cget/cendof to random.jl

e6fcfeb

rename endof/cget to endof_0/getindex_0

4147213

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

ViralBShah closed this Nov 6, 2014

rfourquet mentioned this pull request Nov 18, 2014

enable rand(::AbstractArray, [dims]) (take 3) #9049

Merged

ViralBShah added the randomness Random number generation and the Random stdlib label Nov 22, 2014

rfourquet added a commit that referenced this pull request Nov 22, 2014

refactor RandIntGen API

e9cfc26

This is a rewrite of part of #8309 and #8649. The goal is to be able to implement the BigInt case more efficiently.

rfourquet mentioned this pull request Dec 1, 2014

Implement rand(::Range{BigInt}) #9122

Merged

rfourquet mentioned this pull request Dec 12, 2014

refactor rand on UnitRange / AbstractArray #9324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rand of AbstractArray #8649

rand of AbstractArray #8649

rfourquet commented Oct 10, 2014

ivarne Oct 11, 2014

rfourquet Oct 11, 2014

StefanKarpinski Oct 11, 2014

rfourquet Oct 13, 2014

JeffBezanson Oct 14, 2014

ivarne Oct 14, 2014

rfourquet Oct 16, 2014

rfourquet commented Oct 16, 2014

ViralBShah commented Nov 6, 2014

rfourquet commented Nov 8, 2014

ViralBShah commented Nov 8, 2014

rfourquet commented Nov 8, 2014

ViralBShah commented Nov 12, 2014

ivarne commented Nov 12, 2014

rfourquet commented Nov 18, 2014

ivarne commented Nov 19, 2014

lindahua commented Nov 20, 2014

lindahua commented Nov 20, 2014

rfourquet commented Nov 20, 2014

		@@ -374,6 +375,8 @@ imag{T<:Real}(x::AbstractArray{T}) = zero(x)

		getindex(t::AbstractArray, i::Real) = error("indexing not defined for ", typeof(t))

		cget(t::AbstractArray, i::Real) = @inbounds return t[i+1]

rand of AbstractArray #8649

rand of AbstractArray #8649

Conversation

rfourquet commented Oct 10, 2014

ivarne Oct 11, 2014

Choose a reason for hiding this comment

rfourquet Oct 11, 2014

Choose a reason for hiding this comment

StefanKarpinski Oct 11, 2014

Choose a reason for hiding this comment

rfourquet Oct 13, 2014

Choose a reason for hiding this comment

JeffBezanson Oct 14, 2014

Choose a reason for hiding this comment

ivarne Oct 14, 2014

Choose a reason for hiding this comment

rfourquet Oct 16, 2014

Choose a reason for hiding this comment

rfourquet commented Oct 16, 2014

ViralBShah commented Nov 6, 2014

rfourquet commented Nov 8, 2014

ViralBShah commented Nov 8, 2014

rfourquet commented Nov 8, 2014

ViralBShah commented Nov 12, 2014

ivarne commented Nov 12, 2014

rfourquet commented Nov 18, 2014

ivarne commented Nov 19, 2014

lindahua commented Nov 20, 2014

lindahua commented Nov 20, 2014

rfourquet commented Nov 20, 2014