Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove lazy_static dependancy #52

Merged
merged 6 commits into from
Jul 4, 2019
Merged

Conversation

josephlr
Copy link
Member

@josephlr josephlr commented Jul 1, 2019

EDIT1: We make the following changes:

  • Replace wasm32-stdweb's use of lazy_static with static mut and std::sync::Once
  • Add a sync_init method to LazyUsize complementing unsync_init.
  • Add a LazyFd abstraction around LazyUsize to work with Option<i32> values, were we either have None or Some(x) with x non-negative.
  • Use the LazyFd in use_file, eliminating our last use of lazy_static.

EDIT2: Note that this does not try to remove all std dependencies from use_file.rs. That is for a followup CL.

@Centril
Copy link

Centril commented Jul 2, 2019

cc @RalfJung, @eddyb -- can you please have a look at this in terms of static mut? (rust-lang/rust#62082 wants to use getrandom in libstd)

@Centril
Copy link

Centril commented Jul 2, 2019

also cc @alexcrichton

src/use_file.rs Outdated Show resolved Hide resolved
@josephlr
Copy link
Member Author

josephlr commented Jul 2, 2019

Here's my basic idea for adding a one-time synchronization method to LazyUsize.

// Runs the init() function exactly once.
pub fn sync_init(&self, init: impl FnOnce() -> usize, mut wait: impl FnMut()) -> usize {
    loop {
        match self.0.compare_and_swap(Self::UNINIT, Self::ACTIVE, Relaxed) {
            Self::UNINIT => {
                let val = init();
                self.0.store(val, Relaxed);
                return val;
            }
            Self::ACTIVE => wait(),
            val => return val,
        }
    }
}

Where wait would be sleep_ms(1) or something, and:

// The initialization is not completed.
pub const UNINIT: usize = usize::max_value();
// The initialization is currently running.
pub const ACTIVE: usize = usize::max_value() - 1;

Then we can just use the fact that non-negative RawFd's will fit in the range [0, usize::max_value() - 2].

@newpavlov
Copy link
Member

Looks good! Small nitpicks:

  • I think we can use impl Fn() for wait and init functions.
  • I am not sure about Relaxed ordering. Especially considering that you will use it together with LazyBool/LazyUsize on some targets. @HadrienG2 what do you think?
  • Zero is a valid value for file descriptor. It does not matter for the provided snippet, but you've used "non-zero RawFd's" in your comment.

@HadrienG2
Copy link

HadrienG2 commented Jul 2, 2019

@newpavlov Relaxed is correct for synchronizing atomic accesses to a single memory location, which I understand is what you are doing here.

If you want to synchronize accesses to multiple memory locations (e.g. non-atomically writing data to a shared UnsafeCell and using a separate shared AtomicBool flag to tell other threads when the write is done), then you need stronger memory orderings like Acquire/Release pairs.

@josephlr
Copy link
Member Author

josephlr commented Jul 2, 2019

  • I think we can use impl Fn() for wait and init functions.

Isn’t FnMut a weaker trait bound than Fn?

  • I am not sure about Relaxed ordering. Especially considering that you will use it together with LazyBool/LazyUsize on some targets. @HadrienG2 what do you think?

Does this matter if we’re only using a single atomic value?

  • Zero is a valid value for file descriptor. It does not matter for the provided snippet, but you've used "non-zero RawFd's" in your comment.

Typo, now fixed.

@newpavlov
Copy link
Member

Isn’t FnMut a weaker trait bound than Fn?

Yes, but our callbacks will not contain any state, so stronger bound will work as well. It's just my preference, you can leave it as-is.

Does this matter if we’re only using a single atomic value?

On some targets you use it together with LazyBool/LazyUsize which also use relaxed load/store under the hood. But I guess since use of the second atomic resides behind branch execution of which is explicitly dependent on a certain non-default value of the first atomic it should not create any problems.

@dhardy
Copy link
Member

dhardy commented Jul 2, 2019

Good point. I think on Linux we could just use a single AtomicUsize if we allow another special value to imply "use getrandom syscall", but on Solaris I believe we will need either multiple atomic vars or one "gate" atomic plus one or more data statics.

@HadrienG2
Copy link

HadrienG2 commented Jul 2, 2019

Does this matter if we’re only using a single atomic value?

On some targets you use it together with LazyBool/LazyUsize which also use relaxed load/store under the hood. But I guess since use of the second atomic resides behind branch execution of which is explicitly dependent on a certain non-default value of the first atomic it should not create any problems.

I'm not sure if this is what you are talking about, but loads depending on relaxed loads can actually still cause problems:

if a.load(Ordering::Relaxed) == SOMETHING {
    return b.load(Ordering::Relaxed);
}

Because compiler and CPU trickery can effectively transform the program into this:

let tmp = b.load(Ordering::Relaxed);
if a.load(Ordering::Relaxed) == SOMETHING {
    return tmp;
}

(Not to mention that on the producer side, writes can also be reordered)

If that kind of transformation is undesirable, as is the case when using an atomic as a "ready flag" to tell that other shared memory regions were written to, then you need Acquire when loading from a on the consumer side and Release when writing to it on the producer side.

@dhardy
Copy link
Member

dhardy commented Jul 2, 2019

Thanks @HadrienG2 — I believe this example posted earlier is correct however.

@HadrienG2
Copy link

HadrienG2 commented Jul 2, 2019

@dhardy Yup, it uses stronger Acquire/Release memory orderings to avoid the aforementioned reordering effects.

As pointed out by @RalfJung on that example, using Relaxed ordering on the CAS instead of AcqRel ordering would actually be fine as long as init_data() does not read shared data that is synchronized as a side-effect of using an Acquire load on the flag.

@newpavlov
Copy link
Member

@HadrienG2
Our code can be represented as:

// `a` is initialized with C1, and `b` with `C2`
let val = match a.load(Relaxed) {
    C1 => {
        let val = init1();
        a.store(val, Relaxed);
        val
    }
    val => val,
};

if val == 1 {
    do_stuff1();
} else {
    loop {
        let val = match b.compare_and_swap(C2, C3, Relaxed) {
            C2 => {
                let val = init2();
                b.store(val, Relaxed);
                val
            }
            C3 => continue,
            val => val,
        };
        do_stuff2(val);
    }
}

At the first glance it shouldn't have any problems, since compare_and_swap is a side-effect operation and can not be executed before checking value of the first atomic.

@newpavlov
Copy link
Member

BTW compare_and_swap compiles down to a lock instruction, while relaxed load compiles to a simple mov. Maybe it's worth to add code like this before the loop?

let val = self.0.load(Relaxed);
if val != Self::UNINIT && val != SELF::ACTIVE {
    return val;
}

@dhardy
Copy link
Member

dhardy commented Jul 2, 2019

@newpavlov IIUC x86 is strongly ordered, implying most aquire/release boundaries are unnecessary anyway. You should therefore check code for another target like ARM.

In your example, I think it doesn't actually matter if b is loaded before a is stored anyway — they are independent. (But if not, the read to a must be acquire and the store to a must be release, I believe.)

@HadrienG2
Copy link

HadrienG2 commented Jul 3, 2019

@dhardy As a minor aside, I would advise caution against using "what the hardware does" as a reference for atomic ordering correctness, because the compiler can introduce additional reorderings on top of those that hardware performs, in a manner that can change in future releases even on a given architecture.


@newpavlov Here's my understanding/analysis of the code summary that you posted:

// `a` is initialized with C1
let val = match a.load(Relaxed) {
    C1 => {
        let val = init1();
        a.store(val, Relaxed);
        val
    }
    val => val,
};

First, a number of threads will read C1 from a, concurrently execute init1(), and race to write the output of that function in a. From my understanding, the correctness conditions at this point are:

  • It must be correct for multiple executions of init1() to occur in parallel
  • It must be correct for only one output of these executions to be remembered and the others silently discarded (but still upheld by the threads that produced them).
  • From my understanding of the code's logic, init1() must probably also be prevented from outputting the value C1, and I will assume that this is true in the following.
if val == 1 {
    do_stuff1();
} else {
    /* ... */
}

After this first initialization stage, any thread which reads 1 from a, or produces that value and writes it to a, will then execute do_stuff1() without synchronizing with the init1() call that preceded the initialization of a. This is my understanding of the correctness conditions here:

  • Either do_stuff1() must not depend on any side-effect/write to shared memory from init1(), or the two functions must implement their own synchronization protocol to synchronize with each other. If both of these sentences are incorrect, then you must provide your own synchronization by using a.load(Ordering::Acquire) and a.store(Ordering::Release) in the previous code snippet.
  • It must be correct for do_stuff1() and init1() to run in parallel.
  • If multiple calls to init1() can return different values, then it must be correct for do_stuff1() and the entire loop from the else branch to run in parallel.
if val == 1 {
    /* ... */
} else {
    // `b` is initialized with `C2`
    loop {
        let val = match b.compare_and_swap(C2, C3, Relaxed) {
            C2 => {
                let val = init2();
                b.store(val, Relaxed);
                val
            }
            C3 => continue,
            val => val,
        };
        do_stuff2(val);
    }
}

If a does not end up holding the value 1 but another value (which again I assume to be different from C1), then the threads move to a more classic lazy initialization logic where they race to replace the initial value of b, called C2, with another special value called C3. The winner of the race will get the privilege of running init2() and writing its output to b, while the other threads will be waiting in a spin loop. Assuming that the output of init2() is neither C2 nor C3, every thread involved will then proceed to call do_stuff2() with that output of that parameter, without synchronizing with the side-effects of init2(). Thus, my understanding is that...

  • It must be correct for init1() and either of init2() or do_stuff2() to run in parallel.
  • If init1() is allowed to sometimes return 1 and sometimes return something else, then it must additionally be correct for do_stuff1() and either of init2() or do_stuff2() to run in parallel.
  • Either do_stuff2() must not depend on any side_effect of init2() or these side effects must be taken care of by the two functions. If that's not the case, then you must also switch to an Acquire/Release synchronization protocol here (Acquire on the CAS, Release on the store).
  • It is almost certainly incorrect for init2() to return either C2 or C3.

Hope this attempt at summarizing the underlying assumptions helps.


On an unrelated note, I agree that you should probably check the value of b before doing a compare and swap. From my personal microbenchmarks, atomic read-modify-write instructions like CAS are ~10x more expensive than loads and stores on x86 in serial code, and here you're additionally running them in a loop on multiple threads which will additionally cause cache invalidation and contention...

@josephlr josephlr force-pushed the static branch 3 times, most recently from 1548e7f to 3615137 Compare July 4, 2019 00:43
@josephlr
Copy link
Member Author

josephlr commented Jul 4, 2019

Hey everyone, I've updated this PR and the description to incorporate your recommendations, and I split up the commits to make review easier.

@HadrienG2 thanks so much for the rundown. I've tried to address all of your points by explictly putting your concerns in the comments. BTW 8566c39 contains the changes to LazyStatic and sync_init() which are the tricky parts of this PR. The other commits are mostly straightforward wrappers.

  • It must be correct for multiple executions of init1() to occur in parallel
  • It must be correct for do_stuff1() and init1() to run in parallel.
  • It must be correct for init1() and either of init2() or do_stuff2() to run in parallel.
  • If multiple calls to init1() can return different values, then it must be correct for do_stuff1() and the entire loop from the else branch to run in parallel.

For our implementation, these parallelism assumptions are correct. Also, in the documentation on unsync_init, I note that the init() function could run in parallel with other callers.

  • If init1() is allowed to sometimes return 1 and sometimes return something else, then it must additionally be correct for do_stuff1() and either of init2() or do_stuff2() to run in parallel.

In the documentation on unsync_init, I note that the init() function should always return the same value, if it succeeds. This is the case for all of our current use cases.

  • It must be correct for only one output of these executions to be remembered and the others silently discarded (but still upheld by the threads that produced them).
  • Either do_stuff1() must not depend on any side-effect/write to shared memory from init1(), or the two functions must implement their own synchronization protocol to synchronize with each other. If both of these sentences are incorrect, then you must provide your own synchronization by using a.load(Ordering::Acquire) and a.store(Ordering::Release) in the previous code snippet.
  • Either do_stuff2() must not depend on any side_effect of init2() or these side effects must be taken care of by the two functions. If that's not the case, then you must also switch to an Acquire/Release synchronization protocol here (Acquire on the CAS, Release on the store).

I note in the description of LazyUsize that we should only depend on the value returned by init() (along with including a code example). This is the case for all of our current use cases. We only depend on the value of init(), and code in init() that runs before we generate the value.

  • From my understanding of the code's logic, init1() must probably also be prevented from outputting the value C1, and I will assume that this is true in the following.
  • It is almost certainly incorrect for init2() to return either C2 or C3.

Agreed on init2() returning C2, which I now mention in the comments for sync_init. However, I think it should be fine for init() to return UNINIT. This will just return UNINIT to the caller and future calls to sync_init and unsync_init will just run again.

My comments on LazyUsize explain this "retrying" behavior, and the assumptions necessary to make it work. PTAL

On an unrelated note, I agree that you should probably check the value of b before doing a compare and swap.

Done, and good idea.

@HadrienG2
Copy link

LGTM :)

src/use_file.rs Outdated Show resolved Hide resolved
@newpavlov newpavlov merged commit 0c72017 into rust-random:master Jul 4, 2019
@newpavlov
Copy link
Member

Regarding CI tests for Redox we can bring them back after rust-lang/libc#1420 gets fixed.

@josephlr josephlr deleted the static branch July 4, 2019 22:24
// multiple times. If init() returns UNINIT, future calls to unsync_init()
// will always retry. This makes UNINIT ideal for representing failure.
// of init(). Multiple callers can run their init() functions in parallel.
// init() should always return the same value, if it succeeds.
pub fn unsync_init(&self, init: impl FnOnce() -> usize) -> usize {
// Relaxed ordering is fine, as we only have a single atomic variable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that a file descriptor consists only of that integer and no other user-space data. I suppose that is accurate on Unixes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this is correct, libc::open always returns a positive int (link is to the Linux docs, but the function is a POSIX standard).

So a file descriptor will always fit in a positive int on unix

match self.0.compare_and_swap(Self::UNINIT, Self::ACTIVE, Relaxed) {
Self::UNINIT => {
let val = init();
self.0.store(val, Relaxed);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be a good idea to assert! here that init did not return UNINIT or ACTIVE. This is not hot code, so always asserting seems fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems like the function is actually allowed to return UNINIT to indicate "just try again"? I guess this is implied by the doc comment not ruling out that case, but might be worth calling out explicitly.

ACTIVE can still be asserted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems like the function is actually allowed to return UNINIT to indicate "just try again"? I guess this is implied by the doc comment not ruling out that case, but might be worth calling out explicitly.

This is called out, just in the docs for LazyUsize.

// Both methods support init() "failing". If the init() method returns UNINIT,
// that value will be returned as normal, but will not be cached.

ACTIVE can still be asserted.

Good idea. Thanks @RalfJung for taking a look at this stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be better to write something like this:

let mut val = init();
if val == Self::ACTIVE { val = Self::UNINIT }
self.0.store(val, Relaxed);

And describe this behavior in docs.

None => LazyUsize::UNINIT,
},
|| unsafe {
libc::usleep(1000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's a long sleep, isn't it? A comment seems always warranted with such a magic number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya this sleep should probably be shorter, any recommendations? My guess is that we would want it to be longer than the normal latency for an open(2) syscall.

Copy link
Member

@newpavlov newpavlov Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my laptop libc::open takes ~7µs and on playground ~8µs, so I think 10µs will be a good value. (it's a very coarse measurement, since it also includes time to retrieve current time)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this is outside my range of experience.^^ 1ms just seemed like a long time. /dev/random doesn't even do "real" I/O, so it shouldn't have huge latency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a huge latency if system entropy pool has not initialized, IIRC up to several minutes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that's not the common case.

Ideally wait would do something like double the wait time each time around the loop, starting at e.g. 1µs and maxing out at 1s or so. But that seems like overkill.^^

@josephlr
Copy link
Member Author

@newpavlov @RalfJung this is a good discussion, but might be better had on #60 instead of this closed PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants