-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize RefCell refcount tracking #51630
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@bors try |
Optimize RefCell refcount tracking Address the performance concern raised in #51466 (comment) cc @dtolnay @nnethercote @rust-lang/wg-compiler-performance cc @RalfJung @jhjourdan for soundness concerns Can somebody kick off a perf run on this? I'm not sure how that's done, but I understand it has to be started manually. The idea of this change is to switch to representing mutable refcount as values below 0 to eliminate some branching that was required with the old algorithm.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
src/libcore/cell.rs
Outdated
None | ||
} else { | ||
// Prevent the borrow counter from overflowing into | ||
// a writing borrow. | ||
assert!(b < MIN_WRITING - 1); | ||
assert!(b != isize::max_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert was new in #51466 -- I missed the chance to comment there, but AFAICT (contrary to https://github.com/rust-lang/rust/pull/51466/files#r194758679), returning None
in this case is exactly what the code did previously (before #51466). So I want to bring it up again: why do this?
If a mem::forget(rc.borrow())
loop overflows the borrow flag into the "writing" range, we'll just erroneously consider the RefCell mutably borrowed and refuse to create any new Refs or RefMuts, even without any extra checks. That's not quite as tidy as this assertion firing, but it's sound and the diagnostic value of this assertion failing isn't very high anyway. And since we're optimizing anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. Given that there are no RefMut
s in existence, there'd be no way to RefMut::map_split
in order to create more RefMut
s.
I suppose my one concern would be about the wrapping semantics. Is it well-defined that, in release mode, overflows wrap around to isize::min_value()
? If not, we'll want explicit wrapping addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a 100% sure I understand correctly, but overflows either result in two's complement wrapping or panics, it's not UB or implementation-defined or anything like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a crazy trick (in a positive way ;). I agree it could work, but is it really needed? If panicking is the problem, why not turn the assert!
into
if b == isize::max_value() { return None; }
That still keeps this function panic-free but avoids this strange state where the borrow flag is negative but there are Ref
around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if panicking is the problem, everything's complete speculation anyway, extra computation and branching might also be an issue.
I could have sworn that the "benign overflow" trick is pre-existing (before #51466 that is). But on some more speluking, it seems it was just discussed, never actually implemented? We should probably try matching the previous behavior (i.e., returning None) first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably try matching the previous behavior (i.e., returning None) first.
I would prefer that, also for the sake of the sanity of whoever will try to prove this correct. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed this comment. If that's the preference, I'll re-evaluate the discussion below about assertions.
☀️ Test successful - status-travis |
@Mark-Simulacrum perf requested, The test failure in #51630 (comment) does not seem to affect dist, so I guess we could disregard this for perf. |
I think it would be nice to have a microbenchmark for |
Perf has been queued. |
If I understand correctly, the only lines changed in this PR that are expected to affect performance are in BorrowRefMut::drop: - self.borrow.set(if borrow == MIN_WRITING {
- UNUSED
- } else {
- borrow - 1
- });
+ self.borrow.set(borrow + 1); That's great but I don't believe it is the main source of slowdown in #51466 (though we'll find out for sure in the perf run). In my experience immutable Refs far outnumber RefMuts so I expect the main performance hit would have come from the panic codepath in BorrowRef::new which didn't exist before. It used to be that function could not panic. Now it can panic which requires extra control flow to handle in the caller. I would be interested in comparing performance against @kruppe's suggestions of not panicking in BorrowRef::new. |
Done. |
src/libcore/cell.rs
Outdated
@@ -1022,12 +1023,15 @@ impl<'b> BorrowRef<'b> { | |||
#[inline] | |||
fn new(borrow: &'b Cell<BorrowFlag>) -> Option<BorrowRef<'b>> { | |||
let b = borrow.get(); | |||
if b >= MIN_WRITING { | |||
if b < UNUSED { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear at all that this is testing for "if nobody is writing". Could we have some #[inline(always)]
function or a macro like is_writing(x: BorrowFlag) -> bool
to make this code easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/libcore/cell.rs
Outdated
// writing borrow will simply prevent more borrows of any kind from | ||
// being created. The only way to get a writing borrow when one | ||
// already exists is to use RefMut::map_split, but we know that | ||
// there cannot be any RefMuts in existence right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if you call this 2^31
more times (on a 32bit platform) to get the counter back to 0?
Oh, that's not possible because it will hit the b < UNUSED
above. Maybe the comment should be extended to explain that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/libcore/cell.rs
Outdated
@@ -1038,7 +1042,7 @@ impl<'b> Drop for BorrowRef<'b> { | |||
#[inline] | |||
fn drop(&mut self) { | |||
let borrow = self.borrow.get(); | |||
debug_assert!(borrow < MIN_WRITING && borrow != UNUSED); | |||
debug_assert!(borrow > UNUSED); | |||
self.borrow.set(borrow - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow and this can overflow again to a positive value so you can even keep using the RefCell
... crazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe be worth a comment though. (If you decide to reinstate that, that is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean if we decide to allow overflow? It sounds like the decision was not to allow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I figured that'll depend on perf results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. If we go back to allowing overflow, I'll add a comment here.
src/libcore/cell.rs
Outdated
@@ -1052,7 +1056,7 @@ impl<'b> Clone for BorrowRef<'b> { | |||
debug_assert!(borrow != UNUSED); | |||
// Prevent the borrow counter from overflowing into | |||
// a writing borrow. | |||
assert!(borrow < MIN_WRITING - 1); | |||
assert!(borrow != isize::max_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm but what if borrow
is isize::min_value()
because it overflowed above? Then this will count upwards towards 0.
Seems like by first calling borrow()
often enough to get the flag to overflow, and then calling map_split
often enough to get the counter back to 0, I can break this?
Is that one check for making sure BorrowRef::new
doesn't overflow really so expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At minimum the above comment ("Since this Ref exists, we know the borrow flag is not set to WRITING") needs to be updated.
I dont' see how map_split is relevant, do you mean calling borrow()
often enough to overflow to a negative number and then calling Clone
often enough to get the counter back to zero? That's a good point, but I think it can be fixed by doing a < check like in borrow
(either directly here, at the loss of 1/2^31 possible RefMuts, or by incrementing first and then checking it's still positive). All other locations that increment the counter need similar checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean calling borrow() often enough to overflow to a negative number and then calling Clone often enough to get the counter back to zero?
Yes, and map_split
is a way to call clone()
here. (I forgot that Ref::clone()
is a thing and can also be used for the same purpose. ;)
And yes, this could be fixed by something like assert!(borrow > UNUSED && borrow < isize::max_value())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, I believe, a mistake. Now that we're comfortable with overflowing into the writing territory, we should just remove this assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the check entirely would be wrong. Even if we're fine with overflowing past isize::MAX
, we need to make sure we're not going much further, to 0, because that would allow having a Ref
alongside a RefMut
and thus be unsound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. That's a fair point. I was thinking that it wouldn't be possible to create a new read reference once the refcount was in writing territory, but of course that's only true if you're actually checking first, which happens in BorrowRef::new
, but not in clone
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, I recently saw the discussion about whether to allow overflow at all, and I saw that the conclusion was not to do it, so I'm going to go back to disallowing it.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Perf is ready: Also comparing with 0f8f490 mentioned in #51466 (comment): http://perf.rust-lang.org/compare.html?start=0f8f4903f73a21d7f408870551c08acd051abeb0&end=145f4f411684d06861b2a9a262061f5bc4ef9c93&stat=instructions%3Au (disregard the improvement from NLL though) |
cac5785
to
080bd8e
Compare
To consolidate the discussion on multiple threads with @rkruppe and @RalfJung: I've now updated the PR to go back to explicitly checking for overflow and underflow. However, in |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@kennytm Can we kick off another perf run? Thanks! |
That seems to have worked, thanks! I'm not going to pretend to understand why :P |
OK, I think it's ready for another try. |
@bors try |
Optimize RefCell refcount tracking Address the performance concern raised in #51466 (comment) cc @dtolnay @nnethercote @rust-lang/wg-compiler-performance cc @RalfJung @jhjourdan for soundness concerns Can somebody kick off a perf run on this? I'm not sure how that's done, but I understand it has to be started manually. The idea of this change is to switch to representing mutable refcount as values below 0 to eliminate some branching that was required with the old algorithm.
@RalfJung @joshlf The more straight-forward option, IME, is using
|
☀️ Test successful - status-travis |
Does something else need to happen for this to be merged? |
I think someone with appropriate permissions needs to give r+ again. @dtolnay? |
@bors r+ |
📌 Commit 851cc39 has been approved by |
⌛ Testing commit 851cc39 with merge 8d45fe350d0760d94b50c30021276b5755895eb9... |
💔 Test failed - status-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors retry |
Optimize RefCell refcount tracking Address the performance concern raised in #51466 (comment) cc @dtolnay @nnethercote @rust-lang/wg-compiler-performance cc @RalfJung @jhjourdan for soundness concerns Can somebody kick off a perf run on this? I'm not sure how that's done, but I understand it has to be started manually. The idea of this change is to switch to representing mutable refcount as values below 0 to eliminate some branching that was required with the old algorithm.
☀️ Test successful - status-appveyor, status-travis |
Address the performance concern raised in #51466 (comment)
cc @dtolnay @nnethercote @rust-lang/wg-compiler-performance
cc @RalfJung @jhjourdan for soundness concerns
Can somebody kick off a perf run on this? I'm not sure how that's done, but I understand it has to be started manually.
The idea of this change is to switch to representing mutable refcount as values below 0 to eliminate some branching that was required with the old algorithm.