Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randstring: allow specifying chars to pick from #22222

Merged
merged 13 commits into from
Jul 4, 2017
Merged

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented Jun 5, 2017

This allows the API randstring([rng], [chars], len=8), e.g. randstring('a':'z') will produce a random string of length 8 of lower case letters.

I also added an alias as a method of rand: rand([rng], ::Type{String}, [chars], len=8). I'm not sure both ways should exist, hence requesting for comments. EDIT: this will be for another time (possibly with a different API).

@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label Jun 5, 2017
@rfourquet rfourquet changed the title RFC: allow specifying chars to pick from in randstring (and add a rand(::Type{String}) alias?) [RFC] randstring: allow specifying chars to pick from (and add a rand(::Type{String}) alias?) Jun 5, 2017
@rfourquet rfourquet added the strings "Strings!" label Jun 5, 2017
@tkelman
Copy link
Contributor

tkelman commented Jun 5, 2017

If rand(String, args...) is going to do the same thing as randstring(args...), do we need both?

@rfourquet
Copy link
Member Author

If rand(String, args...) is going to do the same thing as randstring(args...), do we need both?

I don't think so, but this is the point of my "RFC". I would tend to think that the rand(String,...) method is enough, as it's easy for those needing the functionality often to create a shorter randstring alias themseves, and it's probably more discoverable as a rand method. On the other hand, this is a different rand API than usual, with an additional len parameter and without the possibility to create an array by passing a dims parameter (I didn't see it be useful, but I may be wrong?), so this may seen as breaking the consistency of the previous API. Now that I think more about it, it may be safer to just keep randstring for now, and think more broadly about the rand-related API: for example I had the idea to try rand([rng], Sparse(type, proba), dims...) as a replacement for sprand(rng, type, dims, proba). Similarly this could be rand([rng], StringType(chars, len), [dims]) instead of randstring (those would be quite consistent with the current rand API). Also I had proposed rand([rng], Bit, dims) instead of bitrand but thist didn't get support.

base/random.jl Outdated
"""
randstring([rng=GLOBAL_RNG], [chars::AbstractArray{<:Union{UInt8,Char}}], [len=8])

Create a random ASCII string of length `len`, consisting of characters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ASCII if chars has non-ascii in it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch, indeed not, I will update the doc.

@rfourquet
Copy link
Member Author

I decided to remove the proposed rand([rng], Type{String}) alias for now. I plan to extend the possible types for chars parameters, in particular to allow passing a string (e.g. randstring("asd")) when #22224 is merged.

@rfourquet rfourquet changed the title [RFC] randstring: allow specifying chars to pick from (and add a rand(::Type{String}) alias?) randstring: allow specifying chars to pick from Jun 8, 2017
base/random.jl Outdated
randstring(r::AbstractRNG) = randstring(r,8)
randstring(n::Int) = randstring(GLOBAL_RNG, n)
randstring() = randstring(GLOBAL_RNG)
randstring(r::AbstractRNG, chars::AbstractArray{<:Union{UInt8,Char}}=b, n::Int=8) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing UInt8 arrays here is a little odd

(and unrelated to this PR, but anyone else think the default length of 8 is kind of arbitrary? was added without much justification in 399c541)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of Char and UInt8 is because the String constructor accepts arrays of those types, and I thougth that String(v::Vector{UInt8}) was taking ownership of v (according to the docs) so that it would be efficient, but I just did few tests and it seems that this constructor actually copies the data from the vector. Anyway, randstring is still about twice as fast when chars is UInt8 data rather than Char.

(And I also find the default length surprisingly arbitrary, not sure what to do about it though).

Copy link
Contributor

@tkelman tkelman Jun 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docs might need updating in that respect with the StringVector representation change, not sure

@rfourquet
Copy link
Member Author

I updated by not constraining the type of chars: it can be any collection of characters compatible with rand. I could use some help for the docstring (is the !!!note useful, or should it be included in the first part of the docstring?).

base/random.jl Outdated
`UInt8` (more efficient), provided [`rand`](@ref) can randomly
pick characters from it.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no empty line here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I add an empty line when justifying with emacs, and often forget to remove it :-/

test/random.jl Outdated
@test length(randstring(rng..., 20)) == 20
@test issubset(randstring(rng...), b)
for c = ['a':'z', "qwèrtï", Set(Vector{UInt8}("gcat"))],
len = [8, 20]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd indent this one level further or include it on the same line as the rest of the for, it looks like an assignment rather than a loop variable if you miss the final comma on the previous line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I meant indent this line one level further, not the contents of the block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK got it, I didn't know this style

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is how the matlab auto indenter would handle line continuations of block openers - one of the few useful things from matlab I miss

@rfourquet
Copy link
Member Author

Rebased. I will merge in a few days if no objection.

@rfourquet
Copy link
Member Author

CI failure seems unrelated: it concerns libgit2, and a similar problem appears in other unrelated PRs.

@rfourquet rfourquet merged commit c4f2737 into master Jul 4, 2017
@rfourquet rfourquet deleted the rf/rand-String branch July 4, 2017 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants