Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Mutex implementation #11509

Closed
wants to merge 3 commits into from
Closed

Conversation

bill-myers
Copy link
Contributor

Pull #11462 implements a Mutex using a lockless algorithm that has to use thread_yield.

This is an alternative approach that removes all yields by introducing an internal lock, while still keeping the principle that in the native case the mutex must reduce to the native mutex.

Furthermore, this implementation should be fair even when mixing native and green mutexes (assuming that the OS mutex is fair), can always detect recursive mutex locking, and support for green threads can be disabled by changing a constant to improve performance.

There is no longer a need for the MPSC intrusive queue since we now protect the queue with the internal queue_lock.

It compiles and passes tests, but of course would need review.

Contention on the internal lock with green tasks shouldn't be a problem, since such a contention would happen anyway if the tasks were native, and the MPSC queue still involves atomics on a common cacheline, which also generates contention.

The Linux kernel also uses a spinlock to protect its own sleepable mutexes in the contended case (note that in the Linux kernel mutexes all threads behave as green tasks since there is no OS mutex to fallback to, so the requirements are a bit different).

Someone with access to a large server could perhaps benchmark both approaches if desired.

This does not provide the "as soon as unlock() returns you can obliterate the mutex" guarantee, but this is unnecessary for Rust mutexes as discussed in pull #11462.

Much of the justification can be found in the large implementation comment found
in the module itself.
This removes all usage of `try_send_deferred` and all related functionality.
Primarily, this builds all extra::sync primitives on top of `sync::Mutex`
instead of `unstable::sync::Mutex`.
@alexcrichton
Copy link
Member

I greatly appreciate you taking the time to write up this implementation, it will be very useful in benchmarking and comparisons, so thank you!

I firmly believe that the guarantee you are so easily batting away is vital to have. I would encourage you to read http://lwn.net/Articles/575460/ (which a quick googling about the issue brought up). I personally have run into this race. I do not believe that "just use an arc" is a solution. In my mind, this is equivalent to saying "just use a gc" if you're having trouble freeing owned pointers.

Regardless, we need to discuss whether that guarantee is necessary for this mutex implementation or not. I would be uncomfortable merging this if we decided to discard the guarantee for a few reasons:

  • Lock free code is incredibly hard to get right, and there's very little documentation to what's going on inside this mutex. I attempted to closely document everything that was going on, and you seem to have just deleted all of it without replacement.
  • This implementation looks like it has a number of premature optimization which only serve to make the code more impenetrable to read (support_nonblocking_tasks will never be false, unsafe borrows, etc).
  • There's no mention of how this implementation's performance looks (I'd run green threads with RUST_THREADS=1 for now until we're able to fix scheduling to stop fanning out rust tasks to other cores so much)
  • I'm a little concerned about the size of the mutex, and it seems to me that if you already have a concurrent queue you might as well use it to make the structure smaller. I'm not sure whether this is a premature optimization or how well it plays with this implementation.

Finally, I would recommend discussing these issues on an original pull request before opening up a new one. Our code of conduct states:

Respect that people have differences of opinion and that every design or implementation choice carries a trade-off and numerous costs. There is seldom a right answer.

I personally feel in this case that you have read my opinions and immediately discounted them all as wrong. I would encourage you to work out your differences of opinion with me before brazenly deleting all my code.

@thestinger
Copy link
Contributor

using a lockless algorithm that has to use thread_yield

So, it's just a low-priority spin lock and doesn't let the scheduler sleep? Or does it sleep in the scheduler until it's unlocked?

@thestinger
Copy link
Contributor

I firmly believe that the guarantee you are so easily batting away is vital to have. I would encourage you to read http://lwn.net/Articles/575460/ (which a quick googling about the issue brought up). I personally have run into this race. I do not believe that "just use an arc" is a solution. In my mind, this is equivalent to saying "just use a gc" if you're having trouble freeing owned pointers.

I don't really understand why the reference counting should be the responsibility of a mutex. The OS can context-switch after you unlock but before you free it, so what does it really guarantee? The primary users of locking are going to be concurrently accessed data structures that are either static or use atomic reference counting.

@bill-myers
Copy link
Contributor Author

I firmly believe that the guarantee you are so easily batting away is vital to have.
I read http://lwn.net/Articles/575460/ and that can happen mostly because C uses unsafe manual memory management (i.e. the kfree call in the code there).

In Rust, safe code cannot free memory arbitrarily and cannot be affected by this issue.

Unsafe code needs to be somewhat breaking the language to deallocate a mutex early (since it invalidates the pointer the other task has), and I can't come up with a scenario where one needs to do that.

For instance, your unsafe code in Once doesn't have this issue because you properly use an external reference count in the natural way.

support_nonblocking_tasks will never be false
In the current form yes, but it's intended to either stay as documentation, or be converted to a static mut or compile-time option, in which case a program using only native tasks could set it.

I'm a little concerned about the size of the mutex

Possible fixes:

  1. Have an OS mutex variant that doesn't include a condition variable
  2. Conflate queue_nonempty and either queue or queue_tail
  3. Remove owner, just deadlock on recursive locking

Obviously removing the second lock would be ideal, but I'm not sure if that's possible.

Finally, I would recommend discussing these issues on an original pull request before opening up a new one

Since the proposed alternative implementation is very different from the original one, it seems a new pull request was the only way to avoid polluting the original one; also I don't think on Github you can just post new code in response to a pull request in a way that allows to examine it easily.

I'm not totally sure that this is the best approach, as it does have the drawbacks you mention; an approach that both doesn't yield and doesn't use an extra lock would be best, but I'm not sure if that's possible.

@alexcrichton
Copy link
Member

So, it's just a low-priority spin lock and doesn't let the scheduler sleep? Or does it sleep in the scheduler until it's unlocked?

I would recommend reading the pull request for full details, it sounds like you may be conflating ideas specific to kernel locks to user-space locks (I'm unaware of any such spin lock in user space).

In Rust, safe code cannot free memory arbitrarily and cannot be affected by this issue.

You do realize that safe code in rust has no use for a mutex, right? Using a mutex implies that you're using unsafe code to share memory among more than one consumer.

Obviously removing the second lock would be ideal, but I'm not sure if that's possible.

I had an implementation which did this, as I have stated before I did not choose this option.

@bill-myers
Copy link
Contributor Author

Oops, this is in fact not globally fair in the initial version, since green tasks will have precedence because we don't always drop the lock.

Uploaded a new version that is now fair and simpler, by dropping that optimization.

@bill-myers
Copy link
Contributor Author

You do realize that safe code in rust has no use for a mutex, right?

That's not correct: mutexes are useful for general serialization, and sharing memory safely is only one possible use for general serialization.

For example, if you are writing a parallel web crawler, and don't want to make more one simultaneous connection to any given web server (to not overload it), you can use an HashMap mapping IP addresses to mutexes, and use the mutex to protect connections to that specific IP address.

@bill-myers
Copy link
Contributor Author

BTW, I might have found a lockless algorithm that works, which would obsolete the code in this pull request.

See #11462 for the details.

@bill-myers
Copy link
Contributor Author

Obsolete, now proposing pull #11520 instead, with uses a lockless algorithm and also avoids all the disadvantages mentioned in the comments here.

@bill-myers bill-myers closed this Jan 13, 2014
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 25, 2023
fix FP of let_unit_value on async fn args

changelog: [`let_unit_value`]: fix the FalsePostive on async fn arguments

fix rust-lang#11502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants