Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New atomic reference counting algorithm #116173

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Sep 26, 2023

This implements a new 'wait-free' atomic reference counting algorithm based on the paper Wait-Free Weak Reference Counting by @mjp41, @sylvanc and @bensimner.

The paper contains a way to implement atomic reference counting wait free, for five operations:

  1. Release weak (= Weak::drop)
  2. Release strong (= Arc::drop)
  3. Acquire weak (= Weak::clone and Arc::downgrade)
  4. Acquire strong (= Arc::clone)
  5. Acquire strong from weak (= Weak::upgrade)

Weak::upgrade must increase the strong count if it is nonzero. Unfortunately, processors do not have native 'increment if nonzero' instructions. Therefore, it is usually implemented with a CAS loop to increment the strong count only if it is not zero.

The paper shows a way to avoid this CAS loop to make it wait-free. By reserving the least significant bit in the strong counter (by shifting the counter one bit to the left), we can use that extra bit to indicate the final 'dropped' state in which the strong counter is permanently zero. Then Weak::upgrade can be implemented as a fetch_add(2), which leaves the 'permanently zero' bit untouched.

This however does mean that Arc::drop must now do an additional operation. Not only must it decrement the strong counter, it must also use a CAS operation to set the 'permanently zero' bit if the counter is zero.

The paper also shows an optimized version of the algorithm in which an additional bit in the strong counter is reserved to indicate whether there have ever been any weak pointers. When this bit is not set, some steps can be skipped.

However, the algorithm from the paper is unfortunately not something we can directly use in for Rust's standard Arc and Weak, because we have more operations:

  1. Weak::drop
  2. Arc::drop
  3. Weak::clone
  4. Arc::clone
  5. Weak::upgrade
  6. Arc::downgrade
  7. Arc::get_mut
  8. Arc::try_unwrap
  9. Arc::into_inner

Specifically Arc::get_mut requires locking the weak counter to temporarily block Arc::downgrade from completing. We cannot implement Arc::downgrade as just Weak::clone.

Our Arc::downgrade implementation is implemented as a CAS loop to increment the weak counter if it doesn't hold the special 'locked' (usize::MAX) value, similar to how Weak::upgrade uses a CAS loop (which is what the paper solves).

I have extended the algorithm by also reserving the lowest bit in the weak counter to represent the 'weak counter locked' state. That way, Arc::downgrade can be implemented as a fetch_add(2) rather than a CAS loop. It will still have to spin in case of a concurrent call to Arc::get_mut, but in absence of Arc::get_mut, this makes Arc and Weak wait-free.

The paper shows some promising benchmarking results. I have not benchmarked this change yet.

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-atomic Area: Atomics, barriers, and sync primitives labels Sep 26, 2023
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 26, 2023
@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 26, 2023

I don't think rustc itself is a good benchmark, since it doesn't use upgrade and downgrade much. If anyone wants to help benchmark this, that is very much appreciated. Especially projects that use Weak::upgrade very often should see a significant improvement.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@mjp41
Copy link
Contributor

mjp41 commented Sep 26, 2023

Awesome. Super excited to see how this performs outside of the microbenchmarks in our paper.

Please feel free to reach out if you want help.

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 26, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 26, 2023
@bors
Copy link
Contributor

bors commented Sep 26, 2023

⌛ Trying commit 34ed37c with merge 2104764...

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 26, 2023
New atomic reference counting algorithm

This implements a new 'wait-free' atomic reference counting algorithm based on the paper [Wait-Free Weak Reference Counting](https://www.microsoft.com/en-us/research/uploads/prod/2023/04/preprint-644bdf17da97d.pdf) by `@mjp41,` `@sylvanc` and `@bensimner.`

The paper contains a way to implement atomic reference counting *wait free*, for five operations:
1. Release weak (= Weak::drop)
2. Release strong (= Arc::drop)
3. Acquire weak (= Weak::clone and Arc::downgrade)
4. Acquire strong (= Arc::clone)
5. Acquire strong from weak (= Weak::upgrade)

`Weak::upgrade` must increase the strong count if it is nonzero. Unfortunately, processors do not have native 'increment if nonzero' instructions. Therefore, it is usually implemented with a CAS loop to increment the strong count only if it is not zero.

The paper shows a way to avoid this CAS loop to make it wait-free. By reserving the least significant bit in the strong counter (by shifting the counter one bit to the left), we can use that extra bit to indicate the final 'dropped' state in which the strong counter is permanently zero. Then `Weak::upgrade` can be implemented as a `fetch_add(2)`, which leaves the 'permanently zero' bit untouched.

This however does mean that Arc::drop must now do an additional operation. Not only must it decrement the strong counter, it must also use a CAS operation to set the 'permanently zero' bit if the counter is zero.

The paper also shows an optimized version of the algorithm in which an additional bit in the strong counter is reserved to indicate whether there have ever been any weak pointers. When this bit is not set, some steps can be skipped.

However, the algorithm from the paper is unfortunately not something we can directly use in for Rust's standard `Arc` and `Weak`, because we have more operations:

1. Weak::drop
2. Arc::drop
3. Weak::clone
4. Arc::clone
5. Weak::upgrade
7. Arc::downgrade
8. Arc::get_mut
9. Arc::try_unwrap
10. Arc::into_inner

Specifically `Arc::get_mut` requires locking the weak counter to temporarily block `Arc::downgrade` from completing. We cannot implement `Arc::downgrade` as just `Weak::clone`.

Our `Arc::downgrade` implementation is implemented as a CAS loop to increment the weak counter if it doesn't hold the special 'locked' (usize::MAX) value, similar to how Weak::upgrade uses a CAS loop (which is what the paper solves).

I have extended the algorithm by also reserving the lowest bit in the weak counter to represent the 'weak counter locked' state. That way, `Arc::downgrade` can be implemented as a `fetch_add(2)` rather than a CAS loop. It will still have to spin in case of a concurrent call to `Arc::get_mut`, but in absence of `Arc::get_mut`, this makes Arc and Weak wait-free.

The paper shows some promising benchmarking results. I have not benchmarked this change yet.
@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 26, 2023

Once the try build is done, there is a toolchain that can be installed with rustup-toolchain-install-master to try out this new implementation.

The rust-timer bot will also run some benchmarks on rustc itself, but those are unlikely to show anything interesting, because rustc doesn't have any meaningful usage of Weak::upgrade.

@bors
Copy link
Contributor

bors commented Sep 26, 2023

☀️ Try build successful - checks-actions
Build commit: 2104764 (21047641b533575599dedac7012b881d0d7cce1b)

@rust-timer

This comment has been minimized.

Copy link
Contributor

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-ou-se this is really cool to see. Mostly minor, but I think downgrade is incorrect. Please let me know, if you want more explanation of the +2, it is quite subtle, and not covered well in our paper.

library/alloc/src/sync.rs Outdated Show resolved Hide resolved
cur = this.inner().weak.load(Relaxed);
continue;
}
let prev = this.inner().weak.fetch_add(ONE_WEAK, Acquire);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where this is locked, you will loop and re-execute this line, which will add multiple weak references.

Also, the paper checks if it is weak == 0 and if it observes this adds two. This is really important, and not part of your implementation. Without that sequence, it is possible to reach weak == 0 multiple times, which is very bad.

https://github.com/microsoft/verona-artifacts/blob/ee5e758d7f300bf9281d3fb43adffba8e846da37/WFWeakRC/code/include/rcobjectwfopt.h#L47-L56

Copy link
Member Author

@m-ou-se m-ou-se Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where this is locked, you will loop and re-execute this line, which will add multiple weak references.

When the low bit of the weak pointer is set, the rest of the bits are meaningless (just like in the strong counter in your paper). So it doesn't matter that fetch_add is repeated, as long as it leaves the least significant bit set.

Also, the paper checks if it is weak == 0 and if it observes this adds two. This is really important, and not part of your implementation. Without that sequence, it is possible to reach weak == 0 multiple times, which is very bad.

My code does that, but as a separate step: a few lines below here, in the if prev == 0, I add an extra ONE_WEAK, so we raise the weak counter by two in total if it was zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where this is locked, you will loop and re-execute this line, which will add multiple weak references.

When the low bit of the weak pointer is set, the rest of the bits are meaningless (just like in the strong counter in your paper). So it doesn't matter that fetch_add is repeated, as long as it leaves the least significant bit set.

Nice. That's great. I hadn't got that bit. Thanks for explaining.

Also, the paper checks if it is weak == 0 and if it observes this adds two. This is really important, and not part of your implementation. Without that sequence, it is possible to reach weak == 0 multiple times, which is very bad.

My code does that, but as a separate step: a few lines below here, in the if prev == 0, I add an extra ONE_WEAK, so we raise the weak counter by two in total if it was zero.

Sorry I see you increase by two, but doing in two increments was my concern. I was convinced it had to be done in one go, but now I'm trying to write out the counter example and can't find one. I am now convinced it is okay.

At some point, I'll try to extend @bensimner's proof to this variation.

library/alloc/src/sync.rs Outdated Show resolved Hide resolved
library/alloc/src/sync.rs Outdated Show resolved Hide resolved
library/alloc/src/sync.rs Show resolved Hide resolved
@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 26, 2023

Thanks for your detailed review! I will try to address your comments tomorrow.

Edit: found a few minutes to respond right now. ^^

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 26, 2023
m-ou-se and others added 3 commits September 26, 2023 19:40
Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>
The values are the same, but I used the wrong constants.
So, this doesn't change any behaviour. It was just very confusing to
read.
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2104764): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.2%, 1.8%] 5
Regressions ❌
(secondary)
0.7% [0.4%, 1.6%] 7
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.7% [0.2%, 1.8%] 5

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
5.1% [0.1%, 10.1%] 2
Regressions ❌
(secondary)
3.3% [3.3%, 3.3%] 1
Improvements ✅
(primary)
-5.7% [-9.7%, -2.7%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.1% [-9.7%, 10.1%] 6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.6% [1.6%, 1.6%] 1
Regressions ❌
(secondary)
1.2% [1.2%, 1.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.6% [1.6%, 1.6%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.6%] 47
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-0.1%, 0.6%] 49

Bootstrap: 631.784s -> 629.276s (-0.40%)
Artifact size: 317.18 MiB -> 317.32 MiB (0.04%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 26, 2023
@terrarier2111
Copy link
Contributor

terrarier2111 commented Sep 27, 2023

This will probably not yield many gains compared to the regressions as long as there is any overhead on the NO_WEAK path (when we never create any weak refs) as most users of Arc don't create weaks. I am working on an approach that gets rid of all overhead on the fastpath while maintaining the wait-free nature of this impl (as an experiment)

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 27, 2023

Which overhead are you referring to? Doesn't this already take a fast path basically everywhere for the 'no weak' case?

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 27, 2023

Finished benchmarking commit (2104764): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

These benchmark results look okay. The only potentially significant regression is the compilation time of ripgrep, but that extra time is spent in LLVM and not in rustc itself. So those are unrelated to the performance of Arc. That just means that the new Arc code might result in llvm spending a bit more time optimizing/transforming the code.

@terrarier2111
Copy link
Contributor

terrarier2111 commented Sep 27, 2023

Maybe this concern is completely irrational but i would assume because of the frequent usage of weakless Arcs the additional bit op (which is just 1 additional cycle per strong decrement) could add up to outweigh the benefits from improving the weak impl, but maybe i am just wrong about that

@Kobzol
Copy link
Contributor

Kobzol commented Sep 27, 2023

Maybe we could try to benchmark this with the parallel frontend enabled (I'm not sure if it actually uses Arc heavily though). CC @SparrowLii.

@terrarier2111
Copy link
Contributor

terrarier2111 commented Sep 27, 2023

Nvm i saw now that that op can just be applied at compile time (probably). I was talking about the shift in the drop part, if the optimizer understands that it can just modify the RHS of the comparison instead of actually applying the shift to the LHS value then there won't be any additional ops needed. (the only additional thing on the fastpath with simple new + clone + drop operations will be the mov for the argument to drop_slow but that is probably negligible as it will only be executed once the Arc is dropped.

@SparrowLii
Copy link
Member

SparrowLii commented Sep 27, 2023

Maybe we could try to benchmark this with the parallel frontend enabled (I'm not sure if it actually uses Arc heavily though). CC @SparrowLii.

So far, I have not found that Arc has a significant effect on the perf of parallel front end.
#109480 (comment)

@m-ou-se
Copy link
Member Author

m-ou-se commented Sep 28, 2023

I was talking about the shift in the drop part, if the optimizer understands that it can just modify the RHS of the comparison instead of actually applying the shift to the LHS value then there won't be any additional ops needed.

Yeah, that compiles down to a single comparison just fine: https://godbolt.org/z/TvMTq5fez

(I wrote it as a shift because I think that shows the intent more clearly.)

@terrarier2111
Copy link
Contributor

I was talking about the shift in the drop part, if the optimizer understands that it can just modify the RHS of the comparison instead of actually applying the shift to the LHS value then there won't be any additional ops needed.

Yeah, that compiles down to a single comparison just fine: https://godbolt.org/z/TvMTq5fez

(I wrote it as a shift because I think that shows the intent more clearly.)

Alright, sorry for the confusion 😅

@bors
Copy link
Contributor

bors commented Sep 30, 2023

☔ The latest upstream changes (presumably #115546) made this pull request unmergeable. Please resolve the merge conflicts.

@mjp41
Copy link
Contributor

mjp41 commented Oct 12, 2023

@m-ou-se out of interest what is the bar for this making it into Rust? Do you need an application that shows a win, or just no one showing a lose?

@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 19, 2023

@mjp41 I think we should continue to merge if we can find some real world (or at least realistic) code that has a clear win. If we only show no performance regressions but no improvements, I don't think we should merge it. This change makes it a bit harder to maintain the Arc code for future maintainers, so it has to be worth it.

@yijunyu sent me a list of all crates on crates.io that call Weak::upgrade:

List

flattiverse_connector-36.1.1
melodium-0.5.1
druid-shell-0.7.0
zenoh-router-0.5.0-beta.7
zenoh-0.6.0-beta.1
basalt-0.18.0
ntex-0.5.27
rspack_style-0.1.16
mediasoup-0.11.0
wayland-scanner-0.30.0-beta.12
gst-plugin-threadshare-0.8.4
bevy_mod_scripting-0.1.2
rspack_style1-0.1.2
exocore-store-0.1.24
crndm-0.1.0
sodium-rust-2.1.1
kompact-0.11.0
smithay-0.3.0
winit-0.27.4
medea-0.2.0
caldera-0.2.0
luminvent_winit-0.26.1
nut-0.1.1
tauri-winit-0.24.1
stardust-xr-fusion-0.17.5
azul-winit-0.24.0
netidx-0.12.1
snocat-0.6.0-alpha.12
async-stream-packed-0.2.2
notation_model-0.5.0
zbox-0.9.2
ciphercore-base-0.1.2
anthill-di-1.2.4
condure-1.7.0
aeron-rs-0.1.3
device_query-1.1.1
webrtc-0.5.1
requiem-http-1.0.1-r1
scrappy-http-0.0.1
actori-http-1.0.1
openaws-vpn-client-0.1.4
ate-1.3.0
wlambda-0.8.1
cart-tmp-winit-0.22.2
anode-0.1.0
tox_core-0.1.1
actix-0.13.0
unbase-0.0.2
drumbeat-0.1.1
fuzzcheck-0.12.1
eternal-0.3.2
makiko-0.2.0
quickjs_runtime-0.8.5
async-coap-0.1.0
all-is-cubes-0.4.0
geobacter-runtime-core-1.0.0
glock-0.1.2
rustpython-vm-0.1.2
rclrust-0.0.2
rucene-0.1.1
lightning-signer-core-0.1.0-5
glib-0.16.0
rafx-framework-0.0.15
netxserver-1.7.3
weak-list2-0.1.0
congee-0.2.16
rust_engineio-0.4.0
woodchipper-1.1.0
tantivy-nightly-0.17.0-202205251639
corundum-0.4.1
fluvio-cluster-0.7.1
tetsy-jsonrpc-http-server-15.1.0
assemble-core-0.1.2
enfipy-jsonrpc-http-server-15.0.0
authority-round-0.1.0
rust-libcore-0.0.3
es_runtime-0.1.4
jsonrpc-http-server-18.0.0
tantivy-0.18.1
summavy-0.19.0
sentinel-core-0.1.2
gst-plugin-togglerecord-0.8.0
con-art-rust-0.2.0
oxigraph-0.3.6
futures-signals-0.3.31
async-slot-0.1.0
xi-core-lib-0.3.0
tetsy-updater-1.12.0
steamworks-0.9.0
frappe-0.4.7
tc-network-0.8.0
librice-0.0.2
wasmer-vm-near-2.4.0
xtor-0.9.10
async-acme-0.3.1
xiod-0.14.1
stack_test_epic_servers-3.0.3
broker-tokio-0.2.16
mugle_servers-5.2.0-alpha.5
fluence-fork-libp2p-0.36.2
examples-0.4.0
vulkan-malloc-0.1.5
tetsy-libp2p-0.34.3
tc-btree-0.7.0
rafx-resources-0.0.2
vapcore-1.12.1
syslog-rs-0.4.1
tet-libp2p-0.34.0
epic_servers-3.0.0
tokio-udt-0.1.0-alpha.8
grin_servers-5.1.2
crazyflie-lib-0.1.1
crs-bind-0.1.5
sgrankin-tacho-0.5.1
zee-0.3.2
agui_core-0.3.0
sc-network-0.9.0
libp2p-0.49.0
futures-executor-preview-0.3.0-alpha.19
nikidb-0.1.1
tacho-0.4.2
jujube-lib-0.1.1
remote-trait-object-0.5.0
ensync-1.0.1
webwire-0.4.0
graphsync-0.1.0
dependent_view-1.0.2
scrappy-client-0.0.1
requiem-wc-1.0.1-r1
tor-guardmgr-0.7.0
actoriwc-1.0.1
chamomile-0.8.0
awc-3.0.1
fluence-fork-libp2p-core-0.27.2
concurrency_traits-0.7.2
wayland-backend-0.1.0-beta.12
abel-0.1.1
mongodb_cwal-0.6.7
btleplug-0.10.1
everscale-network-0.3.12
tetsy-libp2p-core-0.27.1
carboxyl-0.2.1
bflog-0.3.1
incinerator-0.0.1
tet-libp2p-core-0.27.0
futures-executor-0.3.25
async_event_streams-0.1.4
wayland-server-0.30.0-beta.12
hakuban-0.6.1
sqlx-core-guts-0.6.0
local-pool-with-id-0.1.1
ipipe-0.11.7
bruteforus-0.1.0
single_executor-0.4.1
sqlx-core-0.6.2
indicatif-0.17.1
xiod_fakedata-0.4.1
prodash-21.0.0
fumio-reactor-0.1.0
hyper_wasi-0.15.0
dokan-0.2.0+dokan150
xtra-0.5.2
safe_cell_exts-0.3.0
polyhorn-core-0.4.0
uflow-0.6.1
hyper-0.14.20
cyfs-bdt-0.6.5
vex-rt-0.11.1
libutp-rs-1.0.0
observe-0.1.0
libp2p-helper-0.4.0
flash_rust_ws-0.4.1
hreq-h1-0.3.10
fluence-fork-libp2p-mplex-0.27.2
cell_rc-0.2.0
actix-web-4.2.1
dittolive-ditto-2.0.7
private-tx-1.0.0
tetsy-libp2p-mplex-0.27.2
shared_arena-0.8.4
sharedptr-0.3.4
fclones-0.29.1
appro-eq-0.3.1
hirofa_utils-0.5.5
ergo-rest-0.5.0
zenoh-collections-0.6.0-beta.1
drc-0.1.2
fluvio-wasm-timer-0.2.5
flatland-0.3.3
quickwit-metastore-0.3.0
mplex-0.27.0
futures-timer-3.0.2
rxrs-0.2.0-beta3
vrust-0.0.1
minimum-0.1.0
skywalking-0.4.0
wasm-timer-0.2.5
maia-0.1.1
kayrx-0.18.0
gw2lib-3.0.0-alpha-2
tokio_wasi-1.21.2
amethyst_assets-0.15.3
salvo_extra-0.37.1
nannou_wgpu-0.18.0
eventum-0.1.1
ydb-0.4.2
bottle-1.1.0-alpha
sardonyx_assets-0.0.3
vapcore-secretstore-1.0.0
fuel-p2p-0.11.2
tokio-sync-0.1.8
slack-morphism-1.3.2
phabricator-0.0.4
zbus-3.2.0
bleasy-0.2.2
verneuil-0.6.4
rdom-0.2.0
aspartam-0.1.0
tokio-1.21.2
sgx_tstd-1.1.1
delix-0.2.4
jojo-core-0.1.0
e-nguyen-0.1.2
parking_lot_mpsc-0.1.5
newport_gpu-0.2.0
vapcore-clique-0.1.0
madsim-0.2.8
rtrtr-0.2.2
ibc-relayer-cli-1.0.0
supernova-0.5.0
dipstick-0.9.0
healslut-0.1.0
hidamari-0.1.0
noders-0.0.2
elvis-core-0.1.1
libp2p-blake-streams-0.1.1
pipitor-0.3.0-alpha.15
bb8-0.8.0
tf_observer-0.1.2
futures-util-0.3.25
webrtc-ice-0.8.1
twitch-irc-5.0.0
lucet-runtime-internals-0.6.1
embedded-svc-0.22.1
arendur-0.0.5
dvcompute_branch-1.3.7
aptos-executor-0.2.7
netxclient-1.7.3
dvcompute_dist-1.3.7
dvcompute_cons-1.3.5
nash-native-client-0.3.0
git-branchless-lib-0.5.0
netmod-tcp-0.4.0
chronobreak-0.1.0
khronos-egl-4.1.0
tentacle-0.4.1
event-listener-primitives-2.0.1
timely-0.12.0
safe_core-0.43.1
dvcompute-1.3.4
bonsaidb-local-0.4.1
lettre-0.10.1
zenoh-transport-0.6.0-beta.1
metaplex-pulsar-4.1.1
madsim-tokio-postgres-0.2.0
futures-channel-0.3.25
dioxus-liveview-0.1.0
simple_futures-0.1.2
volo-thrift-0.2.0
rustable-0.3.0
zenoh-link-udp-0.6.0-beta.1
tracing-subscriber-0.3.16
crossfire-0.1.7
tokio-core-0.1.18
substrate-wasmtime-0.19.0
dynamic-pooling-1.0.0
ratchet_core-0.2.0
moka-0.9.4
ezk-sip-core-0.1.1
yukikaze-1.0.10
crymap-1.0.0
sn-pulsar-4.1.3
kayrx-timer-0.1.1
jack-0.10.0
cdbc-pg-0.1.22
scrappy-actor-0.0.1
mun_memory-0.2.0
cdbc-mysql-0.1.22
tokio-timer-0.2.13
tokio-postgres-0.7.7
tokio-tasker-1.2.0
sage_broker-0.3.0
requiem-framed-0.3.0-r1
rbdc-pg-0.1.19
linux-aio-tokio-0.3.0
tinychain-0.11.0
scrappy-framed-0.0.1
pulsar-4.1.3
rbdc-mysql-0.1.17
lock_api-0.4.9
ketos-0.12.0
actori-framed-0.3.0
keclc-framed-0.1.0
fumio-pool-0.1.0
actix-framed-0.3.1
threads_pool-0.2.6
dyn-cache-0.12.2
safina-executor-0.3.3
async-rustbus-0.1.2
udbg-0.2.1
requiem-0.9.0
nimiq-utils-0.2.0
ibc-relayer-0.19.0
asim-0.1.0
actori-0.9.0
potatonet-node-0.4.3
kpgres-0.5.0
zara-1.0.7
pkgcraft-0.0.2
fallacy-arc-0.1.1
ump-0.9.0
rcell-2.0.0-pre0
r2d2-0.8.10
nearly_eq-0.2.4
evc-0.1.2
finchers-0.13.5
daab-0.4.0
alto-3.0.4
futures-util-preview-0.3.0-alpha.19
ezsockets-0.3.0
webrtc-connection-0.2.0
ipfs-0.2.1
locutus-core-0.0.2
render_readme-0.7.5
abi_stable-0.10.4
libp2p-gossipsub-0.42.1
ckb-cli-1.1.1
linux-support-0.0.25
tokio-graceful-shutdown-0.11.1
minidsp-daemon-0.1.4
actix-raft-0.4.4
napi-2.10.0
tiny_http_sccache-0.7.0
txrx-0.1.0
gothack-future-parking_lot-0.3.4
tiny_http-0.12.0
async-liveliness-monitor-0.1.0
mdbook-pdf-headless_chrome-0.1.0
android-activity-0.4.0-beta.1
signal-stack-0.1.0
exonum-explorer-service-1.0.0
msr-core-0.3.5
sarekt-0.0.4
nakadion-0.30.0
janus_core-0.1.17
rouille-maint-in-3.0.1
future-parking_lot-0.3.3
casper-node-1.4.8
trillium-server-common-0.3.0
gotham-0.7.1
v-clickhouse-rs-0.2.0-alpha.7
streamunordered-0.5.2
interledger-store-redis-0.4.0
epoxy_streams-0.3.1
tetsy-hash-fetch-1.12.0
termusic-0.7.4
sophon-wasm-0.18.1
nobs-vulkanism-headless-0.1.0
cyfs-task-manager-0.6.0
cosync-0.2.1
watchexec-2.0.2
clickhouse-rs-1.0.0-alpha.1
rouille-ng-3.0.1
phoenix_channels_client-0.1.0
cozal-0.0.2
cosmian-wit-bindgen-rust-0.1.1
tokio-net-0.2.0-alpha.6
shredder-0.2.0
opentelemetry_sdk-0.18.0
anthill-service-system-1.2.3
ipfs-embed-0.24.0
heph-rt-0.4.1
wayland-cursor-0.30.0-beta.12
rouille-3.6.1
headless_chrome-0.9.0
conjure-runtime-4.0.0
tokio-threadpool-0.1.18
datafusion-13.0.0
amethyst_rendy-0.15.3
agner-reg-0.3.11
hey_listen-0.5.0
futures-locks-pre-0.5.1-pre
dwindow-0.3.0
ciruela-0.6.12
web-glitz-0.3.0
vmnet-0.1.1
virtiofsd-1.4.0
cachepot-0.1.0-rc.1
zerogc-context-0.2.0-alpha.7
tc-client-api-2.0.0
stack_test_epic_api-3.0.3
silver-rs-0.2.0-dev
sc-client-api-3.0.0
romio-0.3.0-alpha.10
akashi-0.5.2
yrs-warp-0.2.0
tubez-0.0.1
swindon-0.7.8
paxakos-0.12.0
async-web-server-0.6.2
async-task-ffi-4.1.1
vhdl_lang-0.17.0
vapcore-light-1.12.0
tube-0.0.2
rppal_w_frontend-0.0.4
reredis-0.1.0-alpha.2
mugle_api-5.2.0-alpha.5
kate-0.1.0
imdl-indicatif-0.14.0
eventuals-0.6.7
darksteel-0.1.0
bluest-0.5.3
abel-core-0.1.1
pi_async-0.5.5
netperf-0.2.3
moleculer-0.3.5
cargo-web-0.6.26
terminus-store-0.19.9
poem-1.3.47
output-0.6.2
futures-net-0.6.0
accesskit_windows-0.6.1
vrp-core-1.19.0
tracing-mutex-0.2.1
termwiz-0.18.0
syndicate-0.23.0
static_locks-0.1.0
salak_factory-0.10.0
roslibrust-0.5.1
lightproc-0.3.6-alpha.0
barrage-0.2.3
async-task-4.3.0
vapcore-sync-1.12.0
unmp-link-udp-0.7.0
unit-rs-0.2.0
rpi_embedded-0.1.0
rodio_wav_fix-0.15.0
joycon-rs-0.6.3
routing-0.37.1
madsim-etcd-client-0.2.8
futures-0.3.25
flo-state-1.1.0
crabquery-0.1.9
async-ucx-0.1.1
tokio-reactor-0.1.12
futures-rate-0.1.5
bandsocks-runtime-0.1.0
async-ws-0.3.3
validator-set-0.1.0
panorama-imap-0.0.4
libevent-0.1.0
gfx-backend-gl-0.9.0
cxx-async-0.1.1
sccache-0.3.0
reqwest_wasi-0.11.12
reqwest-wasm-0.11.15
nimiq-rpc-server-0.2.0
nails-0.13.0
jrsonnet-gcmodule-0.3.4
esp-idf-svc-0.42.5
dominator-0.5.31
byte_channel-0.0.1
aria2-ws-0.3.0
salvia-0.1.0
mz_rusoto_core-0.46.0
mwapi-0.4.1
grin_api-5.1.2
epic_api-3.0.0
douglas-0.1.1
vulkano-0.31.1
ticketed_lock-0.3.0
switchyard-0.3.0
rppal-0.13.1
azul-webrender-0.62.2
agner-actors-0.3.11
trade-0.1.0
pomfrit-0.1.9
gstreamer-0.18.8
dirinventory-0.7.0
asset_lru-0.1.3
stakker-0.2.5
rusoto_core-0.48.0
rodio-0.16.0
web_worker-0.3.0
vgtk-0.3.0
spin-0.9.4
spdlog-rs-0.2.4
reqwest-0.11.12
phabricator-mock-0.0.4
meli-0.7.2
gomoku-0.1.0
winit-modular-0.1.1
rsdb-0.12.1
curl-0.4.44
shared_lru-0.1.5
froggy-0.4.2
crayfish-0.0.1
blunt-0.0.8
xxlib-0.3.3
smartcalc-tui-1.0.8
minibus-0.0.2
wasmyon-0.1.1
hjul-0.2.2
stm-core-0.4.0
heng_rs-0.1.0
rw_lease-0.1.0

So, all crates whose performance might be increased by this PR should all be in this list. It'd be useful to find one or more crates in this list that heavily rely on Weak::upgrade in a hot path.

Another option would be to write some (realistic?) benchmarks that we can include in the standard library's test suite and/or the runtime benchmark suite of rustc-perf.

@oskgo
Copy link
Contributor

oskgo commented Aug 18, 2024

Ping from triage:
@m-ou-se Are you working on the benchmarks? If not, maybe opening an issue could help increase visibility?

@alex-semenyuk alex-semenyuk added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: Atomics, barriers, and sync primitives perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.