-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rand(::String) #22224
add rand(::String) #22224
Conversation
test/random.jl
Outdated
(rand(Int, 100), Int)] | ||
(rand(Int, 100), Int), | ||
("qwèrtï", Char), | ||
(Base.Test.GenericString("qwèrtï"), Char)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would throw in a test for empty strings since that seems like a corner case that could easily break. (I checked this out specifically to test that – it works.)
base/random.jl
Outdated
|
||
rand_iter(rng, s, ::Base.IteratorSize) = throw(ArgumentError("iterator must have a known length")) | ||
|
||
function rand_iter(rng::AbstractRNG, s, ::Union{Base.HasShape,Base.HasLength})::eltype(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why is the return type annotation necessary here? c
is the value that will be returned and isn't that already guaranteed to have type eltype(s)
(or at least <:eltype(s)
if eltype(s)
is abstract) given that it's an element of s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inference probably can't determine that i == pos
is guaranteed to be true for at least one value of the enumerate loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Tony is spot on, and also initially I thought to propose this PR working generally on iterators, and the return type annotation could serve as an additional way to check whether the argument was complying to the iterator interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, nice. Thanks for the explanation, both of you.
base/random.jl
Outdated
for (i, c) in enumerate(s) | ||
i == pos && return c | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little skeptical of the rand_iter
function and AbstractString
support. People using rand
with other string types will probably not expect this function to be O(n); it seems preferable to throw a MethodError rather than to use this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, I had the same hesitation and had planned to update the doc today to warn about the complexity. Then I think that with a proper warning those convenience methods (similarly for AbstractSet
and Associative
at #22228) are not harmful and can be handy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also do rejection sampling: randomly choose and index from start(s):endof(s)
and retry if it's not a valid start of a character – i.e. the same thing you do for String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh great! for some reason, I skimmed too fast on the generic code of isvalid
and thought it was O(n).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was confused not about isvalid
, but about the fact that I thought I needed to compute the length
somehow, which is O(n) in the generic method. Thanks again @StefanKarpinski, I updated the code.
base/random.jl
Outdated
* a type: the set of values to pick from is then equivalent to `typemin(S):typemax(S)` for | ||
integers (this is not applicable to [`BigInt`](@ref)), and to ``[0, 1)`` for floating | ||
point numbers; | ||
|
||
`S` defaults to [`Float64`](@ref). | ||
|
||
Note that the complexity of `rand(rng, s::AbstractString)` is linear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really necessary, but this could be a Documenter note
block, like
!!! note
The complexity of `rand(rng, s::AbstractString)` is linear in the
length of `s` if `s` is not of type `String`. If called more than
a few times, use `rand(rng, collect(s))` instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe rand(rng, collect(s)) or rand(rng, String(s))
, just to emphasize that String
is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If called more than a few times" is a bit awkward because of the ambiguous subject of "called". Maybe "For generating many random characters, use ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but your new formulation seems to include also the case were I call rand(rng, s, dims)
which already does a collect... WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "For multiple calls, consider using ..."
base/random.jl
Outdated
return s[rand(rng, g)] | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StefanKarpinski would you mind having a look at this updated generic implementation, to confirm it complies with AbstractString
API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to do i = rand(rng, g); isvalid(s, i) && return i
since various string types are expected to implement that without the try/catch. The try/catch only exists as a last resort fall-back, and I believe it is more targeted and only catches UnicodeError
exceptions (if it's not already, it should be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks (I believe the idea being that try/catch an exception is expected to occur, and slow down the computation when the isvalid
call would generally be faster). I will update. Note that the generic isvalid
catches all exceptions (not only UnicodeError
). I'm not familiar with strings, but I can open a corresponding simple PR if it helps.
e2cefe4
to
3b97b61
Compare
I rebased (wile changing only the docstring, including "a string (considered as a collection of characters)"), and removed the try/catch block in favor of |
base/random.jl
Outdated
* an `Associative` or `AbstractSet` object, or | ||
* an indexable collection (for example `1:n` or `['x','y','z']`), | ||
* an `Associative` or `AbstractSet` object, | ||
* a string (considered as a collection of characters), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need "or" at the end of this line (or at the beginning of the next line)
Does it need a NEWS.md entry? |
Thanks all for your help! |
This should work more generally for any iterable (with
length
) in principle, but I'm hesitant to introduce a method ofrand
with unconstrained parameters:rand([rng], iter)
, which would mean that this would be the fallback method anytimerand
is called on a type for which it's not defined (so maybe this should wait till some traits for iterators are available).