Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove TLS by using lazy_static #25

Merged
merged 2 commits into from
Jun 25, 2019
Merged

Conversation

josephlr
Copy link
Member

@josephlr josephlr commented Jun 10, 2019

Closes #13 and Closes #37
This PR presents a potential alternative to #14

Instead of using an AtomicUsize state machine for each time we need some global state, this PR just uses a lazy_static value to hold the result of any one-time initialization. Then, on first use, the initialization will be run.

Now the only times we have to deal with global state are exactly where we use lazy_static:

  • The common RawFd for file-based randomness
  • On Linux/Solaris, whether to use getrandom() or a read from /dev/urandom
  • On WASM, whether to use Node or Browser crypto

This change then allows for the following simplifications:

  • The use_file module now contains all the logic for file-based randomness, including platforms that use this as their fallback method.
  • Targets that use a fallback to reading from a file now call use_file::getranodm_inner directly. This required some cleanup to lib.rs.
  • linux_android.rs and solaris_illumos.rs have been greatly simplified.

Note 1: wasm32_bindgen still uses thread_local!, as the JsValues are a per-thread resource, and thus must be initialized for each thread.

Note 2: The lazy_static crate also supports a spin_no_std features, so this approach will still work on no_std platforms. For example, if we wanted to check for RDRAND support (see #27), we could again just use lazy_static.

@josephlr josephlr force-pushed the sync branch 2 times, most recently from d89d521 to e4c7274 Compare June 10, 2019 11:49
@newpavlov
Copy link
Member

newpavlov commented Jun 10, 2019

I haven't reviewed this code closely, but I think it's less resource efficient than my PR. Plus if reading from file has failed during initialization, getrandom will always fail, while in the alternative it may recover.

Personally I would prefer to go with my PR (i.e. improve efficiency at the cost of a somewhat more complex code), but I leave this decision to @dhardy.

@josephlr josephlr force-pushed the sync branch 2 times, most recently from 494d0c4 to 49cdb6a Compare June 11, 2019 01:34
@josephlr
Copy link
Member Author

I haven't reviewed this code closely, but I think it's less resource efficient than my PR.

Initially, I thought this PR and #14 were identical efficacy-wise, as your PR uses an AtomicUsize state machine with fetch_or, which is exactly what sync::Once does except it uses compare_and_swap.

After looking a little closer, it seems that while they are equivalent in the uncontested case, in the constested case where a thread has to block they are different. Your PR calls thread::yield_now while sync::Once calls thread::park. Do you known which is more efficient? The question seems to be: which is better?

  • To have the initializing thread wake up the blocked threads
  • To have the kernel wake up the blocked threads at some point in the future

Plus if reading from file has failed during initialization, getrandom will always fail, while in the alternative it may recover.

This is a very good point. Is this a feature we would want to support? If so, I can add it in.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, other than one point mentioned I didn't check further.

For me, maintainable/comprehensible code is more important than absolute performance, hence I prefer this over #14.

src/wasm32_bindgen.rs Outdated Show resolved Hide resolved
@newpavlov
Copy link
Member

newpavlov commented Jun 11, 2019

Note that you use more memory, e.g. for use_file targets instead of just one atomic which stores both descriptor and initialization state (so we need only one read), you have Once<Result<RawFd, Error>>.

Your PR calls thread::yield_now while sync::Once calls thread::park. Do you known which is more efficient? The question seems to be: which is better?

IIUC yield_now is more lightweight, it's essentially just a syscall. The drawback is that it degrades to busy-waiting if there are no other threads scheduled for execution. But considering that initialization is done only once per program execution and it's relatively fast, I don't think it's a problem.

This is a very good point. Is this a feature we would want to support? If so, I can add it in.

How do you plan to do it? Initially I also wanted to use sync::Once, but I've dropped this idea (partially) because of this problem. Right now there is no stable way to re-execute sync::Once.

For me, maintainable/comprehensible code is more important than absolute performance

Well, I think this code will be really rarely touched, so maintainability will not be a big issue. And code is really straightforward for anyone who is familiar with priniciples of synchronization primitives (the only tricky part is atomics ordering). I believe for such foundational and low-level library zero-costness and efficiency are quite important.

@josephlr josephlr changed the title Remove TLS by using sync::Once Remove TLS by using lazy_static Jun 11, 2019
@josephlr josephlr force-pushed the sync branch 11 times, most recently from 1531b48 to b7e5df8 Compare June 12, 2019 00:28
@josephlr
Copy link
Member Author

josephlr commented Jun 12, 2019

This change should be ready for final review/merging.

@dhardy I've made a few changes/simplifications since your review. The details are in the (edited) PR description. But this gist is:

  • We use local lazy_static values instead of rolling our own (which is what I was doing before).
  • The diff is now much smaller (I made sure to reuse function names as much as possible).

@newpavlov on the issue of efficiency, this approach uses ~16 more bytes of static memory than #14, but it is much more CPU efficient, as the busy-waiting problem is no longer a concern.

But considering that initialization is done only once per program execution and it's relatively fast, I don't think it's a problem.

The main issue would be the initial one-byte read of /dev/random on kernels not supporting getrandom(2). In both #14 and this PR, all threads have to wait on the /dev/random read, which can take a long time in low-entropy cases. This PR prevents the non-initialization threads from spinning in that case.

Right now there is no stable way to re-execute sync::Once.

Ya I missed that initially. If we wanted to have re-executable sync::Once, we would either need to use the panicking/poisoning methods, or roll our own synchronization primitive. Neither really seems worth it, as if you fail initialization once, it will probably never succeed anyway.

@newpavlov
Copy link
Member

but it is much more CPU efficient, as the busy-waiting problem is no longer a concern

In both #14 and this PR, all threads have to wait on the /dev/random read, which can take a long time in low-entropy cases.

You are right, but it can be reasonably worked around by replacing thread::yield_now with thread::sleep(Duration::from_nanos(1)) (it was done earlier, but I reverted it).

@dhardy
Copy link
Member

dhardy commented Jun 12, 2019

If the overhead of using an external type to manage synchronisation is just a few bytes, then that sounds like a win to me. @newpavlov can we convince you that this is a sufficiently good approach? If so then this just needs a fresh review from @newpavlov or myself, then we can merge this (and make a new release, unless @josephlr plans on keeping up this barrage of PRs a little longer).

@josephlr
Copy link
Member Author

(and make a new release, unless @josephlr plans on keeping up this barrage of PRs a little longer).

The only PRs I'm working on going forward would be #30 and support for x86/x86_64 platforms with target_os = "none"

@newpavlov
Copy link
Member

newpavlov commented Jun 13, 2019

@newpavlov can we convince you that this is a sufficiently good approach?

I never argued that it's not, just that we can do better (efficiency wise). :) While my preference is unchanged, but as I've said earlier I will leave choice to you, thus I am not against merging this PR. I haven't done a full review and don't plan to do it in several days at least, so it will be probably faster for you to review and merge.

@dhardy
Copy link
Member

dhardy commented Jun 13, 2019

Good. As I said before (roughly), in my opinion "ultimate efficiency" shouldn't be our goal, but rather a compromise between efficiency, maintainability and "obvious correctness" (i.e. minimising usage of unsafe and hard to understand code patterns).

Yeah, I also can't do a full review right now.

@newpavlov
Copy link
Member

Can you also fix #37 as part of this PR?

@josephlr
Copy link
Member Author

Can you also fix #37 as part of this PR?

Fixed, rebased, and description update to close the bug.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@newpavlov happy if we merge?

@newpavlov newpavlov merged commit af4ea8d into rust-random:master Jun 25, 2019
@josephlr josephlr deleted the sync branch June 26, 2019 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check EPERM in addition to ENOSYS Removing per-thread file descriptors
3 participants