Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: use a queue-based Condvar on NetBSD and other platforms #127578

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joboet
Copy link
Member

@joboet joboet commented Jul 10, 2024

Like #110211, this PR adds a new Condvar implementation based on a lockless queue of Threads. Unfortunately, we cannot entirely eliminate the UNIX condvar code because of the thread priority properties it provides on platforms like macOS. But for NetBSD, Windows 7, SGX, Xous and hopefully some new targets these properties are not relevant.

Tested on macOS (modify the cfg_if in sys/sync/condvar/mod.rs to test locally). As far as I am aware, none of the relevant platforms are tested in CI unfortunately.

@joboet joboet added the A-atomic Area: Atomics, barriers, and sync primitives label Jul 10, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 10, 2024

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 10, 2024
@jhpratt
Copy link
Member

jhpratt commented Jul 11, 2024

This is OS level stuff that I'm not familiar with.

r? libs

@rustbot rustbot assigned cuviper and unassigned jhpratt Jul 11, 2024
//! thread handle is removed from the list.
//!
//! The list itself has the same structure as the one used by the queue-based
//! `RwLock` implementation, see its documentation for more information. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to share the code for that queue, rather than having duplicate code with the same structure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it would be a bit complicated, since the queue-based RwLock is used on more platforms. Also, since they aren't accessed concurrently the nodes need atomic pointer here, but for RwLock that's not the case (there can be multiple reader threads that traverse the queue).

@cuviper
Copy link
Member

cuviper commented Jul 14, 2024

Rerolling because I don't think I'll have time for this soon.

r? libs

@rustbot rustbot assigned Mark-Simulacrum and unassigned cuviper Jul 14, 2024
@bors
Copy link
Contributor

bors commented Jul 16, 2024

☔ The latest upstream changes (presumably #127819) made this pull request unmergeable. Please resolve the merge conflicts.

@workingjubilee workingjubilee added O-netbsd Operating system: NetBSD O-SGX Target: SGX O-xous OS: A microkernel OS for privacy in computing O-windows-7 OS: Windows 7 or Windows Server 2008 R2 or etc. O-unix Operating system: Unix-like O-teeos Target: Arm's secure enclave that code isn't supposed to be able to jailbreak labels Jul 17, 2024
//!
//! Not all platforms provide an efficient `Condvar` implementation: the UNIX
//! `pthread_condvar_t` needs memory allocation, while SGX doesn't have
//! synchronization primitives at all. Therefore, we implement our own.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this is the right tradeoff? It looks like the affected platforms are fairly "rare", and I'm not sure it makes sense for std to be maintaining relatively complex data structures for such platforms, especially if they're not even getting lightly tested in our CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that every couple of months someone comes along and adds a completely new platform to std. Understandably, not every target maintainer is well-versed in writing synchronization primitives, leading to bugs like #114581 on SGX. Actually, SGX is an interesting case, because the environment only provides thread parking, so there'll always be the need for some queue like this one. The idea behind this PR is that this implementation will be used as the default on every new target, unless they have a good reason not to, so that we only have to maintain one admittedly complex implementation instead of five.

There are simpler implementations that I could switch to, if desired (the NetBSD pthread_condvar for example uses a stack of waiters and spins on contended access to it). It's just that making the shared implementation really good seemed like a good idea to me (it also might make it easier to convince people that they really shouldn't bring their own).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yeah, if this is intended to be the fallback queue absent platform-specific, built-in queues that make more sense, then that makes more sense to me.

//! only a pointer in size.
//!
//! This implementation is loosely based upon the lockless `Condvar` in
//! [`usync`](https://github.com/kprotty/usync/blob/8937bb77963f6bf9068e56ad46133e933eb79974/src/condvar.rs).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me, since that repository is only MIT licensed -- and std needs to also be Apache licensed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation shares no code with usync and contains significant alterations to the algorithm, so I don't think that this is a problem. Maybe "inspired by" is a better way to put it.

CC @kprotty what do you think?

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to come back with a more thorough review later.

}
}

#[repr(align(16))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here on why this is 16, and ideally also an assert somewhere that ensures our alignment is sufficient that MASK bits are never set?


impl Node {
unsafe fn notify(node: NonNull<Node>) {
let thread = unsafe { node.as_ref().thread.clone() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to clone the thread here?

Also, can we add safety requirements + justifications on the unsafe functions and blocks added by this PR?

@workingjubilee
Copy link
Member

@he32 It would be nice if you could run the rustc test suite on x86_64-unknown-netbsd, i.e. ./x.py test, and then run it again with this patch and let us know if there are new failures this introduces.

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the testing for this given the limited platform coverage and relatively tricky code. Can we at least run some rudimentary, single-thread tests against it on linux? Reviewing the code manually it looks reasonably OK to me (some additional comments in this PR).

// mutably accessed or destroyed while other threads may
// be accessing it. Guard against unwinds using a panic
// guard that aborts when dropped.
let guard = PanicGuard;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should get moved up to before we register the node, no?

/// # Safety
/// May only be called from the thread to which this handle belongs.
pub(crate) unsafe fn park_timeout(&self, dur: Duration) {
unsafe { self.inner.as_ref().parker().park_timeout(dur) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public park_timeout has a PanicGuard -- should this also have that?

It looks like the ~only advantage to this is that we avoid re-fetching the handle from thread local storage, is that just a performance optimization?

@joboet joboet added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 21, 2024
@alex-semenyuk alex-semenyuk added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives O-netbsd Operating system: NetBSD O-SGX Target: SGX O-teeos Target: Arm's secure enclave that code isn't supposed to be able to jailbreak O-unix Operating system: Unix-like O-windows Operating system: Windows O-windows-7 OS: Windows 7 or Windows Server 2008 R2 or etc. O-xous OS: A microkernel OS for privacy in computing S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants