Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rand of AbstractArray #8649

Closed
wants to merge 6 commits into from
Closed

Conversation

rfourquet
Copy link
Member

This undoes revert of #8309, and re-enables rand of long ranges, which was impossible after #8309 (hence the revert).
In this new PR, rand(typemin(Int):typemax(Int)) remains valid, but there are new possibilities, like e.g. rand(typemin(Int):1:typemax(Int)).

It also seems that rand is now slightly faster in some cases (see for example small benchmark in #8563, i.e. @time for i = 1:10000000; rand(1:4); end).

The reason I didn't implement this in #8309 is that I don't think I'm the qualified person to introduce the needed functionalities, which is here provided by cendof and cget: these work like endof and getindex respectively but with 0-based index (like in C, hence the name; I found the name cgetindex too ugly as it has to be used in code, unlike getindex which has a nice sugar syntax).
The idea is that currently a range like r=typemin(Int):typemax(Int) is such that r[endof(r)] raises an OverflowError. So we need a (lower-level) way to request the element at any position of such ranges. With the new functions, cget(r, 0) == typemin(Int) and cget(r, cendof(r)) == typemax(Int) with cendof(r) == typemax(Uint).
I expect this PR to be rejected but hope that in this case someone will implement alternative functions.

cc @ivarne (BTW, I removed the T type paramater, cf. your comment #8309 (comment)).

@@ -374,6 +375,8 @@ imag{T<:Real}(x::AbstractArray{T}) = zero(x)

getindex(t::AbstractArray, i::Real) = error("indexing not defined for ", typeof(t))

cget(t::AbstractArray, i::Real) = @inbounds return t[i+1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll not get this use of @inbounds past Jeff. #8227 is still open about being able to propagate @inbounds declarations trough functions in a meaningful way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but then I'm not sure what to do: rename this method to cget_unsafe instead? (in my mind, the "c" in "cget" was already a warning, and this is meant for internal use only). Or remove @inbounds and hope that there is no/little overhead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a simple definition, I'm not sure what benefit there is to even having it if its strictly for internal use. Similarly, cendof just seems like a pointless private definition mixed in with a bunch of things that are public.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions cendof and cget are needed only for their specializations for ranges (where the definition is slightly less trivial). In order for this PR's extended rand functionality to work with ranges like rand(typemin(Int):typemax(Int)), there needs to be a way to index into every position of such a range (which is not particularly specific to rand). Implementing cget/cendof was the simplest solution I found, but I humbly request for help to improve the design of these functions or to find alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stefan has a good point. In fact, cget is only used in two places, so it looks like it would be just as easy to specialize those cases for Ranges rather than introduce cget and cendof. At the very least I would move all the code for cget to random.jl, since I certainly hope we won't see increasing use/need of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to start in the Random module in random.jl and move to Base in abstractarray.jl if we need it in more places. range.jl is also a possible place, as the problem is really that we want UnitRange to support a length of typemax(Uint) + 1, and other uses of UnitRange might also need to support full ranges.

How about the names length_0 and getindex_0 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think length_0 fits, as whatever the first indice is, the length remains the same, unlike the last indince; I could rename to endof_0 and getindex_0 (I first thougth of length_minus_1 or even length_1 , taking "_" as meaning "minus", but "end of" describes better what is needed).

I can certainly move the code for cget/cendof in random.jl. I didn't do that at first as rand has nothing to do with these implementations (rand needs it only not to break previously working functionality), and these could be used elsewhere also (e.g. sum(1:typemax(Uint)) works but not sum(0:typemax(Uint)) which should give the same answer, but I admit it's not a very convincing example). However I'd rather not have rand have a specialization for ranges.

@rfourquet
Copy link
Member Author

Updated PR: cget/cendof moved to random.jl and renamed to getindex_0/endof_0. If this gets to be merged, I will rebase/squash those new commits.
Should the tests for those functions be moved to test/random.jl (instead of range.jl) or be simply deleted?

@ViralBShah
Copy link
Member

It would be nice to have rand on long ranges as a separate PR. Given all the work done recently on random, I think a fresh PR would be a good idea, and perhaps the endof_0 stuff can be a later PR after the basic stuff is in place.

Closing this one for now.

@ViralBShah ViralBShah closed this Nov 6, 2014
@rfourquet
Copy link
Member Author

@ViralBShah I'm not sure to understand what you mean: the fact that rand on long ranges was not supported anymore is the reason why #8309 was reverted, and the endof_0 stuff is the mecanism by which they were re-enabled here.
So is your suggestion to make a PR adding rand of AbstractArray, keeping the existing implementation for ranges, and later having a PR which makes rand of ranges using the general AbstractArray version (introducing endof_0/getindex_0 to make rand on long ranges still work) ?

@ViralBShah
Copy link
Member

Actually, I need to fully understand what's going on here, and I did not originally understand the PR. I'd love to see what some others have to say on this.

@rfourquet
Copy link
Member Author

My motivation in this PR was that rand(a::Range) is a particular case of rand(a::AbstractArray): pick randomly an element from a. Unfortunately, the easy way to implement this generalization (#8309) makes rand on ranges like in rand(typemin(Int):typemax(Int)) not work anymore, because taking the length of such ranges triggers an error, so endof_0/getindex_0 were introduced to work-around this limitation.
In the process, RandIntGen API was modified/simplified (cf. also #8309 (comment)).

@ViralBShah
Copy link
Member

@StefanKarpinski What is a good solution to the long range issue here? Introducing special versions of endof and getindex just doesn't feel right.

length(typemin(Int):typemax(Int)) is an overflow error

r=typemin(Int):typemax(Int)
julia> endof(r)
ERROR: OverflowError()
 in endof at abstractarray.jl:24

@ivarne
Copy link
Member

ivarne commented Nov 12, 2014

As long as nobody wants to use rand(typemax(Int):-1:typemin(Int)), or have a rather weird FloatRange, this is only a potential problem for rand(r::UnitRange). We currently have a unsolved length problem for the full range objects, but we can easily write a special version for rand(r::UnitRange).

@rfourquet
Copy link
Member Author

@ivarne : do you mean that it would be OK that rand(typemax(Int):-1:typemin(Int)) continues not working? I understand that rand(r::UnitRange) should be handled specially only because rand(typemin(Int):typemax(Int)) already works and mustn't stop working.

@ivarne
Copy link
Member

ivarne commented Nov 19, 2014

As long as length() of all collections is an Int not a UInt, I consider all Ranges longer than typemax(Int) useless (or at least risky to use in generic code). Generic AbstractArray code is likely to throw OverflowError() when working with such long ranges. If we need some AbstractArray method to work with really long ranges, we should write a specific method for the ultralong type so that we can know that it actually works (and is tested!).

We have seen a specific request for rand(typemin(Int):typemax(Int)) to work, so we should abandon the view of UnitRange as a AbstractArray and write a rand(r::UnitRange) method that is careful about overflow issues. We have not yet seen a convincing use case for for StepRange (for step==-1) or a really strange FloatRange, so I don't think they are important

@lindahua
Copy link
Contributor

I agree with @ivarne that making rand(typemax(Int):-1:typemin(Int)) work is not that important in real world. This example is primarily contrived to test whether a function works in extreme cases.

It is good if it works. If it is difficult to make it work or that making it work requires an approach that compromises performance, then I would say just leave it as it is.

@lindahua
Copy link
Contributor

Most functions that work with AbstractArray rely on the length method, and therefore they do not work with things like typemax(Int):-1:typemin(Int). I seems it hasn't caused a lot of problems in real practice.

I don't see why rand is so special that it needs to work around with the problem of length throwing an OverflowError().

@rfourquet
Copy link
Member Author

Thanks for the feedback @ivarne and @lindahua, so I I'll get rid of endof_0/getindex_0 for now.

@ViralBShah ViralBShah added the randomness Random number generation and the Random stdlib label Nov 22, 2014
rfourquet added a commit that referenced this pull request Nov 22, 2014
This is a rewrite of part of #8309 and #8649.
The goal is to be able to implement the BigInt case more efficiently.
rfourquet added a commit that referenced this pull request Dec 11, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange are handled
differently so that those with overflowing length still work.
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 11, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange are handled
differently so that those with overflowing length still work.
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 11, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange ranges are handled
differently so that those with overflowing length still work).
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 12, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange ranges are handled
differently so that those with overflowing length still work).
In particular two rand! method are merged into one.
rfourquet added a commit that referenced this pull request Dec 24, 2014
This is another approach to the (non-merged) refactoring part
of #8309 and #8649.
The goal is to reduce to the minimum the different code paths
taken by UnitRange and AbstractArray (UnitRange ranges are handled
differently so that those with overflowing length still work).
In particular two rand! method are merged into one.

Previously, RangeGenerator objects could create
(scalar of array of) random values in a range a:b, taking care of
creating first a random value in 0:b-a and then adding a. Choosing a
random value in an AbstractArray A was then using A[rand(1:length(A))].
Here, RangeGenerator is changed to only handle the creation of random
values in a range 0:k, and determining the right value of k
(length(A)-1 or b-a) and picking the right element using the random
value v (A[1+v] or a+v) is left to separate (and minimal) methods.
Hence Range and AbstractArray are handled as uniformly as possible,
given that we still want to support ranges like
typemin(Int):typemax(Int) for which length overflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants