-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random: make RandomDevice() instances share the same /dev/urandom file #27936
Conversation
31a2743
to
2f1cb8f
Compare
2f1cb8f
to
70e4656
Compare
bump |
70e4656
to
4167cdc
Compare
Bump |
@rfourquet: I'm not sure what you need here... review? Who would be good to review? |
I need feedback on the idea! I think it's a good one, but I wouldn't dare merging this kind of change without approval. But a code review would be nice too as I touch internals that I'm not so comfortable with. |
It seems like a fine idea to me. We can always hold off on exporting it for a while in case we change our minds. |
I have a couple of thoughts. I didn't like the idea of having a default open file descriptor globally - but it probably doesn't matter much in practice. Would this be frowned upon in production codes for any reason? The other thing looking at this code is that it seems to be Unix only. So, code using it won't work on Windows. It would be much nicer if it at least worked the same on all supported operating systems, if we are going to introduce a new global. One possibility is that we could provide a helper function that makes this convenient, and is a no-op on windows? |
I will give context on what prompted me to implement this global The obvious solution here would be So I think that
I can't say, and don't know whom to ask!
No, in all cases |
Could we just cache the random device handle once it’s been opened the first time? |
We could, but I'm not very clear on what problem this solves... What are the drawbacks of opening unconditionally the pseudo-file "/dev/urandom" in each Julia session? EDIT: sorry for such late reply! |
Does anyone know if it's thread-safe to have one global object with one handle to the "/dev/urandom" file ? I will add the triage label here. Also, another idea: instead of exporting yet another |
How does that affect Julia's startup time? If it's negligible, then this is fine, if it's not, we may want to great a special object that opens this file lazily. |
I think it's negligible. But I will definitely check and do the thing lazily if needed. But actually, we create anyway a |
At the moment, I believe that all I/O is locked so it should be but @JeffBezanson or @vtjnash should be able to say more. |
4167cdc
to
b626bce
Compare
I've updated as follows: the file handle is now a global variable, which is assigned lazily. So It remains to be clarified whether this is thread-safe to read "/dev/urandom" from multiple thread (through the same file handle). Otherwise, I will just put the files in thread-local storage, and if required (for performance), put back a file handle in the |
This should be threadsafe on unix (from 1.3 we use threadsafe libuv functions for IO). On prior versions, a segfault seems ok (just like No idea about windows: We use an ancient deprecated cryptographically questionable API. |
Ah right of course.
|
No idea either! but at least it doesn't share the problem of the unix version of opening a file descriptor for each creation of an instance! But definitely an issue if the API is deprecated, although probably more relevant to discuss this in #32954. |
Maybe it would be good to open an issue about this and ideally fix it and use a new, good API 😁 |
Creating a RandomDevice object is cheap but it opens a file on unix systems, so one can't call `rand(RandomDevice())` repeatedly too quicly (e.g. in a loop). It can therefore be convenient to have a readily available generator.
9c5d7d8
to
9c25b2e
Compare
I believe this is in a good state to be merged. Would be good if someone (maybe @vtjnash ?) could review, the change is very simple here. As a TL;DR: having each
I find both options to be overkill, and I would expect the files to be opened anyway in most programs. Opening the files seem to take sub-millisecond time (like 30 microseconds on my machine), so it's not likely to be a problem for start-up time. |
I wonder how long until we can switch this to use the new "modern" interface of |
stdlib/Random/src/RNGs.jl
Outdated
unlimited = deserialize(s) | ||
return RandomDevice(unlimited=unlimited) | ||
end | ||
getfile(rd::RandomDevice) = @inbounds DEV_RANDOM[1 + rd.unlimited] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be @inbounds
as it is not essential. And while equivalent, I think using ?:
will read more clearly.
I'd also suggest going with the lazy approach you mentioned.
getfile(rd::RandomDevice) = @inbounds DEV_RANDOM[1 + rd.unlimited] | |
function getfile(rd::RandomDevice) | |
dev = rd.unlimited ? 2 : 1 | |
isassigned(DEV_RANDOM, dev) || (DEV_RANDOM[dev] = open(rd.unlimited ? "/dev/urandom" : "/dev/urandom")) | |
return DEV_RANDOM[dev] | |
end |
function __init__()
...
Sys.iswindows() || resize!(empty!(DEV_RANDOM), 2)
nothing
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also suggest going with the lazy approach you mentioned.
Your suggestion is basically what I had in the second commit, but @chethega mentioned that this looks thread unsafe:
There is a data race on initialization of the RNG, but that's benign: worst case, we open nthreads many file descriptors to /dev/urandom.
#27936 (comment)
I was quite conviced. That's why I then gave up laziness because I found it overkill to have one file per thread (third commit), and annoying to explicitly use locks.
But I'm not well versed in data-races, what is the worst case here? to open a file twice? is DEV_RANDOM[dev]
in your example always pointing to a valid file even if the location has been written concurrently?
(and thanks for your review!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you'd potentially leak up to nthreads
copies of the file descriptors instead of 2. We really need to add a Threads.once
function soon, since it's rather absurd to be making design decisions based on the lack of a simple little helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had misunderstood @chethega's comment, which I thought was suggesting to "open nthread many file descriptors" ourselves, i.e. in an array like the default_rng()
, to avoid the data-race! Sorry for the noise.
So I updated accordingly, with a TODO to add "thread-once" when availble.
a959bba
to
4d89e5a
Compare
What's the status of this? |
This is good to go on my side, I reverted to a previous version of this PR which basically matches what @vtjnash suggested, except that it uses two global refs instead of an array of length 2 ("/dev/random" is likely to be only very rarely open, so if we are lazy, lets be so all the way and open only the necessary file(s)). A final quick review might be appropriate. I didn't give a second thought to the name |
Side note to whomever will merge: squash the commits, but set what is in the title of the PR as the commit message, as none of the intermediate commit messages is good. |
This is a solution to the problem that too many
RandomDevice()
couldn't co-exist because each one would open a separate file.EDIT: the initial OP below was about "add a global RANDOM_DEVICE generator Random", which was the initial solution to the same problem.
Creating a RandomDevice object is cheap but it opens a file
on unix systems, so one can't call
rand(RandomDevice())
repeatedly too quicly (e.g. in a loop). It can therefore be
convenient to have a readily available generator.
It was surprisingly difficult to implement this global
RandomDevice
object (of course it would be easier to use aRef{RandomDevice}
), so users shouldn't be expected to have to do so themselves, which justifies I think to put it inRandom
. The current implementation is ugly (and possibly wrong), so improvement suggestions are welcome.