Make rand work with AbstractArray instead of only with Range #8309

rfourquet · 2014-09-11T13:27:06Z

This is essentially two changes:

the semi-public API to get a RandIntGen object generating random integers in 1:n is via a call to randintgen(n). This is to be able in particular to extend the mechanism to BigInt for which the existing type RandIntGen doesn't fit well (cf. fix infinite loop in rand(::Range(BigInt)) #8255). In the process, RandIntGen was simplified by making the first element of the range equal to 1: this was only used with this value AFAICS, and this was duplicating indexing logic as noted by @ivarne (who found that this change should not break packages in METADATA.jl). It seems this was first implemented in commit ab97911 by @lindahua: was there a use for the first element field .a that I overlooked?
allow rand(r) to work for any indexable r, not only with ranges (e.g. rand(["head", "tail"]). It is really the continuation of commit 48f27bc, and overlaps a bit with rand vs. sample #6003. This change is so small that I think it is harmless, whatever is done wrt rand vs. sample #6003.

ps: I think the random.jl tests are not very safe as the seed is set the same at each run. Could the content of __init__ function from base/random.jl be made a public function named e.g. srand(), which could be used in the tests?

ivarne · 2014-09-11T13:49:42Z

base/random.jl

@@ -160,24 +160,24 @@ maxmultiple(k::Uint128) = div(typemax(Uint128), k + (k == 0))*k - 1
 maxmultiplemix(k::Uint64) = convert(Uint64, div((k >> 32 != 0)*0x0000000000000000FFFFFFFF00000000 + 0x0000000100000000, k + (k == 0))*k - 1)

 immutable RandIntGen{T<:Integer, U<:Unsigned}
-    a::T   # first element of the range
-    k::U   # range length or zero for full range
+    k::U   # range length (or zero for full range)
    u::U   # rejection threshold


Do we still get any benefit from having both T and U as type parameters?

The random generation logic is implemented only with 3 types (parameter U): Uint32, Uint64, Uint128, and the generation for all types (param. T) is implemented in term of those. An instance r of RandIntGen needs to know what type of number to produce, in a call like rand(r), this is given by T. It's probably possible to have only one type parameter, but I felt that was beyond the scope of this PR.

That explains it. Thanks!

ivarne · 2014-09-11T18:26:27Z

This change looks good to me. Any objections?

nolta · 2014-09-14T02:30:54Z

Why are you removing the a field from RandIntGen? It seems harmless.

rfourquet · 2014-09-14T05:50:53Z

The a field was harmless, but also appears useless to me, at least now: the only purpose of RandIntGen is to provide a random indice in 1:n to index an arbitrary range (and now any array-like collection, with the third commit of this PR). However I asked in the opening comment if there was a use for this field that I didn't catch.

rfourquet · 2014-09-14T05:55:55Z

I updated the title of this PR as it had been set automatically according to the non very descriptive branch name.

ViralBShah · 2014-09-14T06:31:47Z

Cc: @andreasnoack

rfourquet · 2014-09-14T06:57:45Z

Actually, I also wanted to restrict the possibility for the field k to be zeroed to mean "full range". This feature was not used in RandIntGen constructors accepting a range (renamed randintgen here). I will do that in a new commit in this PR, which I could remove independently if not wanted.

ivarne · 2014-09-14T11:00:48Z

The a field increases complexity and likely decreases performance. It is not used in Packages, and all uses in base ~~had a=1~~ has a prettier fix now.

I think that our rand(Type{Int}) familiy of functions removes the need for 0 = "full range" functinality.

rfourquet · 2014-09-14T12:11:18Z

Updated PR. Yes rand{Integer}(T) is the way to get "full range" numbers. So the maxmultiple functions are slightly simplified by not handling k==0. I also rewrote maxmultiplemix in a less obscure way (I can revert that) and adjusted the condition to match that of rand{T}(g::RandIntGen{T,Uint64}) (that was not a bug however).

simonster · 2014-09-14T16:27:53Z

I would like to switch the argument order for rand! so that the output comes first (#8246). Right now we can deprecate the old argument order and not break anything, but if we allow an AbstractArray as the first argument to rand! then switching the order later would break code.

ivarne · 2014-09-14T18:15:45Z

+1 for @simonster's suggestion. That will mean that we can't extend the signature for rand! to AbstractArray until 0.6, because of the deprecation period in 0.4 and a error period in 0.5 in order to ensure that everyone changes the order of arguments before we can give new meaning to the old signature.

simonster · 2014-09-14T18:34:06Z

I don't think it's likely to cause problems if we simultaneously deprecate rand!(::Range, ::AbstractArray) and extend rand! to AbstractArray with the mutated argument first. It's not possible to mutate a Range, so whether you call rand!(::Range, ::AbstractArray) or rand!(::AbstractArray, ::Range) it's clear what you mean.

rfourquet · 2014-09-15T06:00:05Z

@simonster I prefer your solution to do it now! however it's then more a breaking change forcing users to change their code all at once instead of having a depreciation warning.

So should I revert this changed rand! signature (delete, comment out, or maybe put it in a submodule to be able to use it), with a TODO note saying to apply the change after #8246 is resolved? I prefer that instead of having this PR depend on that issue.

simonster · 2014-09-15T17:34:07Z

My point is that you can still have the deprecation warning. There is no valid signature for rand! with which it would conflict.

rfourquet · 2014-09-16T09:51:00Z

OK, I missed that there would both be rand!(r::Range, A::AbstractArray) (mutates A, deprecated) and rand!(A::AbstractArray, r::AbstractArray) (mutates A).

rfourquet · 2014-09-16T11:44:30Z

I removed rand!(A::AbstractArray, r::AbstractArray) (renamed to rand!(A::AbstractArray, r::AbstractArray, ::()) so it can still be used by rand. If this is an accepted temporary solution, I can rebase this branch to avoid introducing this method in the first place.

rfourquet · 2014-09-17T04:44:26Z

Rebased.

ivarne · 2014-09-28T13:19:04Z

base/random.jl

-RandIntGen{T<:Unsigned}(r::UnitRange{T}) = isempty(r) ? error("range must be non-empty") : RandIntGen(first(r), convert(T,last(r) - first(r) + 1))
+# generator API
+# randintgen(k) returns an object generating random integers in the range 1:k
+randintgen{T<:Unsigned}(k::T) = k<1 ? error("range must be non-empty") : RandIntGen(T, k)


The error message should be more descriptive to the current situation where the range is gone.

Should the invariant (k>0) be checked in a inner constructor?

@ivarne: you are right that the message is not ideal, but I didn't know the best way to change it. The invariant in an inner constructor seems like a good idea. From the user POV, the range is gone not because there is no an a field anymore, but because she can use other things (array-like) than ranges. The message could be "the collection passed to rand/rand! must not be empty" ("collection" is used in the docs).

That's why I did not suggest what the message should be. RandIntGen doesn't know there is a collection, so that feels wrong too. How about?

error("No integers in the range [1, $k]. Did you try to pick an element from a empty collection?")

I like this solution.

ivarne · 2014-09-28T13:20:32Z

Bump! Any objections to me merging this in a few days?

ViralBShah · 2014-09-29T16:24:13Z

I think this looks good to merge. Perhaps give a day for @lindahua @andreasnoack to take a look?

rfourquet · 2014-09-30T09:14:19Z

This is not ready for merging.

I rebased to accomodate with recent @JeffBezanson changes, so please review if I didn't mess up with convert/itrunc/reinterpret.

The tests don't pass now: the problem is that length(typemin(T):typemax(T))==0 for some values of T (e.g. T=Int8). I could re-enable special handling of "full-range" to make tests pass again, but I feel this problem must be addressed more generally, hence #8531. I'll wait for input to go ahead.

rfourquet · 2014-09-30T11:50:01Z

Once the problem of length is fixed for "small" integers, my preferred solution would be to simply remove failing tests: for example the range r=typemin(Int):typemax(Int) is not fully indexable (there is no i such that r[i] returns the last element typemax(Int), and I don't think ranges should be handled in a special way in random.jl. Later on it would be possible to add a function length_wide which can always hold the length of the object (and can always index into this object if it's indexable), which would return e.g. an Int128 for Int ranges, and BigInt for Int128 ranges.

@ivarne

This is based on @ivarne idea (cf. JuliaLang#8255 (comment)), and continues commit 48f27bc in "avoiding duplicating the getindex logic". The API to get a RandIntGen object is now to call randintgen(n). This allows any Integer type to implement this function (e.g. BigInt). Previously, a call like rand(big(1:10)) caused a stack overflow, it is now a "no method matching" error, until randintgen(::BigInt) is implemented, possibly using a new type similar to RandIntGen.

mschauer · 2014-10-01T12:22:54Z

Sorry I missed the discussion, but I think

yes rand{Integer}(T) is the way to get "full range" numbers

is not the right perspective. The full range isn't "not really a range", in fact a lot of work was done to make full ranges working as other ranges. Also #5550.

rfourquet · 2014-10-01T13:35:20Z

I don't think I said that a full range is "not really a range", and I want rand(typemin(Int):typemax(Int) to work again (and I plan to do so). In spirit, this PR was about going away from range-centric random generation, and the more general functionality relies now on length. So I think the way to solve the problem is to settle on a new function which can express the length of every AbstractArray, including ranges. I see 3 possibilities at the moment: length_wide mentioned above, a length_minus_1 function which can express the length of every non-empty ranges (so it would probably be of an unsigned type), or an index function which would return a UnitRange (at least for "dense indexables") giving the first and last valid index of a collection (this would allow to not exclude indexables with indexes starting at 0 or at a negative integer). The problem of "full range" must be solved generally, not only for rand.

mschauer · 2014-10-01T14:13:39Z

I was taking about the sentiment "full ranges are not really a range", you are quoted with ">". Anyway, thanks for planning to fix this, it took me some work. The old version - you saw it - used a representation corresponding to a function length_nonempty(r) = isfull(r) ? length(r) : 0 which works together with the modulo_knuth function and was quite fast, without escaping to Int128 or even BigInt.

JeffBezanson · 2014-10-03T00:34:28Z

So if I understand, this commit makes some calls to rand on ranges that used to work, no longer work? What exactly is the benefit of that?

I'm all for generality, but removing functionality in the mean time doesn't help. This also caused a performance problem (#8563). It really looks like this should be reverted.

@rfourquet

cc @rfourquet This reverts commit 7e87797. This reverts commit c754a60. This reverts commit 38bef62. This reverts commit d9814ff. This reverts commit 91aa94f. This reverts commit 57bd2b7. This reverts commit 9d0282e. This reverts commit 6d329ce. This reverts commit e2b14a1.

ivarne · 2014-10-03T07:15:45Z

OK, I reverted this in 6117328
@rfourquet I'm really sorry how this turned out, but I agree with Jeff that it is bad to break functionality temporary. To get your work back from this revertion, just do git revert 6117328, and submitt a new PR when you are ready.

JeffBezanson · 2014-10-03T14:23:09Z

The other aspects of this change are fine, if they can be done in a way
that keeps all functionality. Nothing wrong with having special methods for
ranges; we are all about stuff like that.

rfourquet · 2014-10-05T11:31:06Z

@ivarne no I'm sorry, it's all on me!
In this PR, some calls to rand on ranges no longer worked indeed, but only in those cases where length(range) didn't work (which calls for a solution in a broader context than random generation). The advantage was new functionalities, which I thougth to be worth enough to loose rand(typemin(Int):typemax(Int)).

mschauer · 2014-10-05T12:37:59Z

Am I missing something or can't we have just both?

rfourquet · 2014-10-05T12:45:15Z

@mschauer yes of course! I'm now working on a proposition, which could also handle more ranges, like rand(typemin(Int):1:typemax(Int)) etc.

StefanKarpinski · 2014-10-05T17:25:17Z

Unfortunately, we can't just weigh functionality gained against functionality lost – they are asymmetrical since people may be depending on existing functionality. Thus we can't break things that currently work without serious consideration.

This reverts commit 2abb59b. This reverts commit 6117328.

This is a rewrite of 6d329ce, 9d0282e and d9814ff (from #8309, which was reverted).

This is a rewrite of part of #8309 and #8649. The goal is to be able to implement the BigInt case more efficiently.

This is a rewrite of 6d329ce, 9d0282e and d9814ff (from JuliaLang#8309, which was reverted).

This is another approach to the (non-merged) refactoring part of #8309 and #8649. The goal is to reduce to the minimum the different code paths taken by UnitRange and AbstractArray (UnitRange are handled differently so that those with overflowing length still work. In particular two rand! method are merged into one.

This is another approach to the (non-merged) refactoring part of #8309 and #8649. The goal is to reduce to the minimum the different code paths taken by UnitRange and AbstractArray (UnitRange ranges are handled differently so that those with overflowing length still work). In particular two rand! method are merged into one.

This is another approach to the (non-merged) refactoring part of #8309 and #8649. The goal is to reduce to the minimum the different code paths taken by UnitRange and AbstractArray (UnitRange ranges are handled differently so that those with overflowing length still work). In particular two rand! method are merged into one. Previously, RangeGenerator objects could create (scalar of array of) random values in a range a:b, taking care of creating first a random value in 0:b-a and then adding a. Choosing a random value in an AbstractArray A was then using A[rand(1:length(A))]. Here, RangeGenerator is changed to only handle the creation of random values in a range 0:k, and determining the right value of k (length(A)-1 or b-a) and picking the right element using the random value v (A[1+v] or a+v) is left to separate (and minimal) methods. Hence Range and AbstractArray are handled as uniformly as possible, given that we still want to support ranges like typemin(Int):typemax(Int) for which length overflows.

ivarne reviewed Sep 11, 2014
View reviewed changes

ivarne mentioned this pull request Sep 11, 2014

zero argument srand() #8313

Closed

rfourquet changed the title ~~Rand range~~ Make rand work with AbstractArray instead of only with Range Sep 14, 2014

rfourquet mentioned this pull request Sep 16, 2014

more methods working with a non-global RNG #8380

Closed

rfourquet force-pushed the rand_range branch from d7d89cb to e859850 Compare September 16, 2014 11:40

rfourquet force-pushed the rand_range branch 2 times, most recently from 78a1f14 to 91b58f7 Compare September 17, 2014 04:41

ivarne reviewed Sep 28, 2014
View reviewed changes

rfourquet force-pushed the rand_range branch from 91b58f7 to 1adbe3e Compare September 30, 2014 08:26

kmsquire mentioned this pull request Oct 3, 2014

Remove costly assert from RandIntGen #8563

Merged

rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014

re-add JuliaLang#8309 (which was reverted)

94d9008

This reverts commit 2abb59b. This reverts commit 6117328.

rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014

re-add JuliaLang#8309 (which was reverted)

2c7a91e

This reverts commit 2abb59b. This reverts commit 6117328.

rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014

re-add JuliaLang#8309 (which was reverted)

2894c31

This reverts commit 2abb59b. This reverts commit 6117328.

rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014

re-add JuliaLang#8309 (which was reverted)

1c78445

This reverts commit 2abb59b. This reverts commit 6117328.

rfourquet added a commit to rfourquet/julia that referenced this pull request Oct 10, 2014

re-add JuliaLang#8309 (which was reverted)

d52ebff

This reverts commit 2abb59b. This reverts commit 6117328.

rfourquet mentioned this pull request Oct 10, 2014

rand of AbstractArray #8649

Closed

rfourquet added a commit that referenced this pull request Nov 18, 2014

enable rand(::AbstractArray, [dims])

66aa1a9

This is a rewrite of 6d329ce, 9d0282e and d9814ff (from #8309, which was reverted).

rfourquet mentioned this pull request Nov 18, 2014

enable rand(::AbstractArray, [dims]) (take 3) #9049

Merged

ViralBShah added system:32-bit Affects only 32-bit systems randomness Random number generation and the Random stdlib labels Nov 22, 2014

rfourquet added a commit that referenced this pull request Nov 22, 2014

refactor RandIntGen API

e9cfc26

This is a rewrite of part of #8309 and #8649. The goal is to be able to implement the BigInt case more efficiently.

waTeim pushed a commit to waTeim/julia that referenced this pull request Nov 23, 2014

enable rand(::AbstractArray, [dims])

5862909

This is a rewrite of 6d329ce, 9d0282e and d9814ff (from JuliaLang#8309, which was reverted).

rfourquet mentioned this pull request Dec 12, 2014

refactor rand on UnitRange / AbstractArray #9324

Closed

Make rand work with AbstractArray instead of only with Range #8309

Make rand work with AbstractArray instead of only with Range #8309

Conversation

rfourquet commented Sep 11, 2014

ivarne Sep 11, 2014

Choose a reason for hiding this comment

rfourquet Sep 11, 2014

Choose a reason for hiding this comment

ivarne Sep 11, 2014

Choose a reason for hiding this comment

ivarne commented Sep 11, 2014

nolta commented Sep 14, 2014

rfourquet commented Sep 14, 2014

rfourquet commented Sep 14, 2014

ViralBShah commented Sep 14, 2014

rfourquet commented Sep 14, 2014

ivarne commented Sep 14, 2014

rfourquet commented Sep 14, 2014

simonster commented Sep 14, 2014

ivarne commented Sep 14, 2014

simonster commented Sep 14, 2014

rfourquet commented Sep 15, 2014

simonster commented Sep 15, 2014

rfourquet commented Sep 16, 2014

rfourquet commented Sep 16, 2014

rfourquet commented Sep 17, 2014

ivarne Sep 28, 2014

Choose a reason for hiding this comment

rfourquet Sep 29, 2014

Choose a reason for hiding this comment

ivarne Sep 29, 2014

Choose a reason for hiding this comment

rfourquet Sep 29, 2014

Choose a reason for hiding this comment

ivarne commented Sep 28, 2014

ViralBShah commented Sep 29, 2014

rfourquet commented Sep 30, 2014

rfourquet commented Sep 30, 2014

mschauer commented Oct 1, 2014

rfourquet commented Oct 1, 2014

mschauer commented Oct 1, 2014

JeffBezanson commented Oct 3, 2014

ivarne commented Oct 3, 2014

JeffBezanson commented Oct 3, 2014

rfourquet commented Oct 5, 2014

mschauer commented Oct 5, 2014

rfourquet commented Oct 5, 2014

StefanKarpinski commented Oct 5, 2014