New atomic reference counting algorithm #116173

m-ou-se · 2023-09-26T11:34:44Z

This implements a new 'wait-free' atomic reference counting algorithm based on the paper Wait-Free Weak Reference Counting by @mjp41, @sylvanc and @bensimner.

The paper contains a way to implement atomic reference counting wait free, for five operations:

Release weak (= Weak::drop)
Release strong (= Arc::drop)
Acquire weak (= Weak::clone and Arc::downgrade)
Acquire strong (= Arc::clone)
Acquire strong from weak (= Weak::upgrade)

Weak::upgrade must increase the strong count if it is nonzero. Unfortunately, processors do not have native 'increment if nonzero' instructions. Therefore, it is usually implemented with a CAS loop to increment the strong count only if it is not zero.

The paper shows a way to avoid this CAS loop to make it wait-free. By reserving the least significant bit in the strong counter (by shifting the counter one bit to the left), we can use that extra bit to indicate the final 'dropped' state in which the strong counter is permanently zero. Then Weak::upgrade can be implemented as a fetch_add(2), which leaves the 'permanently zero' bit untouched.

This however does mean that Arc::drop must now do an additional operation. Not only must it decrement the strong counter, it must also use a CAS operation to set the 'permanently zero' bit if the counter is zero.

The paper also shows an optimized version of the algorithm in which an additional bit in the strong counter is reserved to indicate whether there have ever been any weak pointers. When this bit is not set, some steps can be skipped.

However, the algorithm from the paper is unfortunately not something we can directly use in for Rust's standard Arc and Weak, because we have more operations:

Weak::drop
Arc::drop
Weak::clone
Arc::clone
Weak::upgrade
Arc::downgrade
Arc::get_mut
Arc::try_unwrap
Arc::into_inner

Specifically Arc::get_mut requires locking the weak counter to temporarily block Arc::downgrade from completing. We cannot implement Arc::downgrade as just Weak::clone.

Our Arc::downgrade implementation is implemented as a CAS loop to increment the weak counter if it doesn't hold the special 'locked' (usize::MAX) value, similar to how Weak::upgrade uses a CAS loop (which is what the paper solves).

I have extended the algorithm by also reserving the lowest bit in the weak counter to represent the 'weak counter locked' state. That way, Arc::downgrade can be implemented as a fetch_add(2) rather than a CAS loop. It will still have to spin in case of a concurrent call to Arc::get_mut, but in absence of Arc::get_mut, this makes Arc and Weak wait-free.

The paper shows some promising benchmarking results. I have not benchmarked this change yet.

m-ou-se · 2023-09-26T11:35:35Z

I don't think rustc itself is a good benchmark, since it doesn't use upgrade and downgrade much. If anyone wants to help benchmark this, that is very much appreciated. Especially projects that use Weak::upgrade very often should see a significant improvement.

mjp41 · 2023-09-26T12:53:09Z

Awesome. Super excited to see how this performs outside of the microbenchmarks in our paper.

Please feel free to reach out if you want help.

m-ou-se · 2023-09-26T14:47:12Z

@bors try @rust-timer queue

bors · 2023-09-26T14:48:21Z

⌛ Trying commit 34ed37c with merge 2104764...

New atomic reference counting algorithm This implements a new 'wait-free' atomic reference counting algorithm based on the paper [Wait-Free Weak Reference Counting](https://www.microsoft.com/en-us/research/uploads/prod/2023/04/preprint-644bdf17da97d.pdf) by `@mjp41,` `@sylvanc` and `@bensimner.` The paper contains a way to implement atomic reference counting *wait free*, for five operations: 1. Release weak (= Weak::drop) 2. Release strong (= Arc::drop) 3. Acquire weak (= Weak::clone and Arc::downgrade) 4. Acquire strong (= Arc::clone) 5. Acquire strong from weak (= Weak::upgrade) `Weak::upgrade` must increase the strong count if it is nonzero. Unfortunately, processors do not have native 'increment if nonzero' instructions. Therefore, it is usually implemented with a CAS loop to increment the strong count only if it is not zero. The paper shows a way to avoid this CAS loop to make it wait-free. By reserving the least significant bit in the strong counter (by shifting the counter one bit to the left), we can use that extra bit to indicate the final 'dropped' state in which the strong counter is permanently zero. Then `Weak::upgrade` can be implemented as a `fetch_add(2)`, which leaves the 'permanently zero' bit untouched. This however does mean that Arc::drop must now do an additional operation. Not only must it decrement the strong counter, it must also use a CAS operation to set the 'permanently zero' bit if the counter is zero. The paper also shows an optimized version of the algorithm in which an additional bit in the strong counter is reserved to indicate whether there have ever been any weak pointers. When this bit is not set, some steps can be skipped. However, the algorithm from the paper is unfortunately not something we can directly use in for Rust's standard `Arc` and `Weak`, because we have more operations: 1. Weak::drop 2. Arc::drop 3. Weak::clone 4. Arc::clone 5. Weak::upgrade 7. Arc::downgrade 8. Arc::get_mut 9. Arc::try_unwrap 10. Arc::into_inner Specifically `Arc::get_mut` requires locking the weak counter to temporarily block `Arc::downgrade` from completing. We cannot implement `Arc::downgrade` as just `Weak::clone`. Our `Arc::downgrade` implementation is implemented as a CAS loop to increment the weak counter if it doesn't hold the special 'locked' (usize::MAX) value, similar to how Weak::upgrade uses a CAS loop (which is what the paper solves). I have extended the algorithm by also reserving the lowest bit in the weak counter to represent the 'weak counter locked' state. That way, `Arc::downgrade` can be implemented as a `fetch_add(2)` rather than a CAS loop. It will still have to spin in case of a concurrent call to `Arc::get_mut`, but in absence of `Arc::get_mut`, this makes Arc and Weak wait-free. The paper shows some promising benchmarking results. I have not benchmarked this change yet.

m-ou-se · 2023-09-26T14:49:12Z

Once the try build is done, there is a toolchain that can be installed with rustup-toolchain-install-master to try out this new implementation.

The rust-timer bot will also run some benchmarks on rustc itself, but those are unlikely to show anything interesting, because rustc doesn't have any meaningful usage of Weak::upgrade.

bors · 2023-09-26T15:58:55Z

☀️ Try build successful - checks-actions
Build commit: 2104764 (21047641b533575599dedac7012b881d0d7cce1b)

mjp41

@m-ou-se this is really cool to see. Mostly minor, but I think downgrade is incorrect. Please let me know, if you want more explanation of the +2, it is quite subtle, and not covered well in our paper.

library/alloc/src/sync.rs

mjp41 · 2023-09-26T15:53:52Z

library/alloc/src/sync.rs

-                cur = this.inner().weak.load(Relaxed);
-                continue;
-            }
+            let prev = this.inner().weak.fetch_add(ONE_WEAK, Acquire);


In the case where this is locked, you will loop and re-execute this line, which will add multiple weak references.

Also, the paper checks if it is weak == 0 and if it observes this adds two. This is really important, and not part of your implementation. Without that sequence, it is possible to reach weak == 0 multiple times, which is very bad.

https://github.com/microsoft/verona-artifacts/blob/ee5e758d7f300bf9281d3fb43adffba8e846da37/WFWeakRC/code/include/rcobjectwfopt.h#L47-L56

In the case where this is locked, you will loop and re-execute this line, which will add multiple weak references.

When the low bit of the weak pointer is set, the rest of the bits are meaningless (just like in the strong counter in your paper). So it doesn't matter that fetch_add is repeated, as long as it leaves the least significant bit set.

Also, the paper checks if it is weak == 0 and if it observes this adds two. This is really important, and not part of your implementation. Without that sequence, it is possible to reach weak == 0 multiple times, which is very bad.

My code does that, but as a separate step: a few lines below here, in the if prev == 0, I add an extra ONE_WEAK, so we raise the weak counter by two in total if it was zero.

In the case where this is locked, you will loop and re-execute this line, which will add multiple weak references.

When the low bit of the weak pointer is set, the rest of the bits are meaningless (just like in the strong counter in your paper). So it doesn't matter that fetch_add is repeated, as long as it leaves the least significant bit set.

Nice. That's great. I hadn't got that bit. Thanks for explaining.

Also, the paper checks if it is weak == 0 and if it observes this adds two. This is really important, and not part of your implementation. Without that sequence, it is possible to reach weak == 0 multiple times, which is very bad.

My code does that, but as a separate step: a few lines below here, in the if prev == 0, I add an extra ONE_WEAK, so we raise the weak counter by two in total if it was zero.

Sorry I see you increase by two, but doing in two increments was my concern. I was convinced it had to be done in one go, but now I'm trying to write out the counter example and can't find one. I am now convinced it is okay.

At some point, I'll try to extend @bensimner's proof to this variation.

library/alloc/src/sync.rs

m-ou-se · 2023-09-26T17:32:05Z

Thanks for your detailed review! I will try to address your comments ~~tomorrow~~.

Edit: found a few minutes to respond right now. ^^

Co-authored-by: Matthew Parkinson <mjp41@users.noreply.github.com>

The values are the same, but I used the wrong constants. So, this doesn't change any behaviour. It was just very confusing to read.

rust-timer · 2023-09-26T19:37:32Z

Finished benchmarking commit (2104764): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.2%, 1.8%]	5
Regressions ❌ (secondary)	0.7%	[0.4%, 1.6%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.7%	[0.2%, 1.8%]	5

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	5.1%	[0.1%, 10.1%]	2
Regressions ❌ (secondary)	3.3%	[3.3%, 3.3%]	1
Improvements ✅ (primary)	-5.7%	[-9.7%, -2.7%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.1%	[-9.7%, 10.1%]	6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.6%	[1.6%, 1.6%]	1
Regressions ❌ (secondary)	1.2%	[1.2%, 1.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.6%	[1.6%, 1.6%]	1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.6%]	47
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.2%	[-0.1%, 0.6%]	49

Bootstrap: 631.784s -> 629.276s (-0.40%)
Artifact size: 317.18 MiB -> 317.32 MiB (0.04%)

terrarier2111 · 2023-09-27T10:32:18Z

This will probably not yield many gains compared to the regressions as long as there is any overhead on the NO_WEAK path (when we never create any weak refs) as most users of Arc don't create weaks. I am working on an approach that gets rid of all overhead on the fastpath while maintaining the wait-free nature of this impl (as an experiment)

m-ou-se · 2023-09-27T11:27:08Z

Which overhead are you referring to? Doesn't this already take a fast path basically everywhere for the 'no weak' case?

m-ou-se · 2023-09-27T11:32:44Z

Finished benchmarking commit (2104764): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

These benchmark results look okay. The only potentially significant regression is the compilation time of ripgrep, but that extra time is spent in LLVM and not in rustc itself. So those are unrelated to the performance of Arc. That just means that the new Arc code might result in llvm spending a bit more time optimizing/transforming the code.

terrarier2111 · 2023-09-27T11:55:50Z

Maybe this concern is completely irrational but i would assume because of the frequent usage of weakless Arcs the additional bit op (which is just 1 additional cycle per strong decrement) could add up to outweigh the benefits from improving the weak impl, but maybe i am just wrong about that

Kobzol · 2023-09-27T12:04:00Z

Maybe we could try to benchmark this with the parallel frontend enabled (I'm not sure if it actually uses Arc heavily though). CC @SparrowLii.

terrarier2111 · 2023-09-27T12:10:07Z

Nvm i saw now that that op can just be applied at compile time (probably). I was talking about the shift in the drop part, if the optimizer understands that it can just modify the RHS of the comparison instead of actually applying the shift to the LHS value then there won't be any additional ops needed. (the only additional thing on the fastpath with simple new + clone + drop operations will be the mov for the argument to drop_slow but that is probably negligible as it will only be executed once the Arc is dropped.

SparrowLii · 2023-09-27T12:12:32Z

Maybe we could try to benchmark this with the parallel frontend enabled (I'm not sure if it actually uses Arc heavily though). CC @SparrowLii.

So far, I have not found that Arc has a significant effect on the perf of parallel front end.
#109480 (comment)

library/alloc/src/sync.rs

m-ou-se · 2023-09-28T11:03:44Z

I was talking about the shift in the drop part, if the optimizer understands that it can just modify the RHS of the comparison instead of actually applying the shift to the LHS value then there won't be any additional ops needed.

Yeah, that compiles down to a single comparison just fine: https://godbolt.org/z/TvMTq5fez

(I wrote it as a shift because I think that shows the intent more clearly.)

terrarier2111 · 2023-09-28T14:48:28Z

I was talking about the shift in the drop part, if the optimizer understands that it can just modify the RHS of the comparison instead of actually applying the shift to the LHS value then there won't be any additional ops needed.

Yeah, that compiles down to a single comparison just fine: https://godbolt.org/z/TvMTq5fez

(I wrote it as a shift because I think that shows the intent more clearly.)

Alright, sorry for the confusion 😅

bors · 2023-09-30T05:00:23Z

☔ The latest upstream changes (presumably #115546) made this pull request unmergeable. Please resolve the merge conflicts.

mjp41 · 2023-10-12T09:36:53Z

@m-ou-se out of interest what is the bar for this making it into Rust? Do you need an application that shows a win, or just no one showing a lose?

m-ou-se · 2023-10-19T12:02:44Z

@mjp41 I think we should continue to merge if we can find some real world (or at least realistic) code that has a clear win. If we only show no performance regressions but no improvements, I don't think we should merge it. This change makes it a bit harder to maintain the Arc code for future maintainers, so it has to be worth it.

@yijunyu sent me a list of all crates on crates.io that call Weak::upgrade:

List

flattiverse_connector-36.1.1
melodium-0.5.1
druid-shell-0.7.0
zenoh-router-0.5.0-beta.7
zenoh-0.6.0-beta.1
basalt-0.18.0
ntex-0.5.27
rspack_style-0.1.16
mediasoup-0.11.0
wayland-scanner-0.30.0-beta.12
gst-plugin-threadshare-0.8.4
bevy_mod_scripting-0.1.2
rspack_style1-0.1.2
exocore-store-0.1.24
crndm-0.1.0
sodium-rust-2.1.1
kompact-0.11.0
smithay-0.3.0
winit-0.27.4
medea-0.2.0
caldera-0.2.0
luminvent_winit-0.26.1
nut-0.1.1
tauri-winit-0.24.1
stardust-xr-fusion-0.17.5
azul-winit-0.24.0
netidx-0.12.1
snocat-0.6.0-alpha.12
async-stream-packed-0.2.2
notation_model-0.5.0
zbox-0.9.2
ciphercore-base-0.1.2
anthill-di-1.2.4
condure-1.7.0
aeron-rs-0.1.3
device_query-1.1.1
webrtc-0.5.1
requiem-http-1.0.1-r1
scrappy-http-0.0.1
actori-http-1.0.1
openaws-vpn-client-0.1.4
ate-1.3.0
wlambda-0.8.1
cart-tmp-winit-0.22.2
anode-0.1.0
tox_core-0.1.1
actix-0.13.0
unbase-0.0.2
drumbeat-0.1.1
fuzzcheck-0.12.1
eternal-0.3.2
makiko-0.2.0
quickjs_runtime-0.8.5
async-coap-0.1.0
all-is-cubes-0.4.0
geobacter-runtime-core-1.0.0
glock-0.1.2
rustpython-vm-0.1.2
rclrust-0.0.2
rucene-0.1.1
lightning-signer-core-0.1.0-5
glib-0.16.0
rafx-framework-0.0.15
netxserver-1.7.3
weak-list2-0.1.0
congee-0.2.16
rust_engineio-0.4.0
woodchipper-1.1.0
tantivy-nightly-0.17.0-202205251639
corundum-0.4.1
fluvio-cluster-0.7.1
tetsy-jsonrpc-http-server-15.1.0
assemble-core-0.1.2
enfipy-jsonrpc-http-server-15.0.0
authority-round-0.1.0
rust-libcore-0.0.3
es_runtime-0.1.4
jsonrpc-http-server-18.0.0
tantivy-0.18.1
summavy-0.19.0
sentinel-core-0.1.2
gst-plugin-togglerecord-0.8.0
con-art-rust-0.2.0
oxigraph-0.3.6
futures-signals-0.3.31
async-slot-0.1.0
xi-core-lib-0.3.0
tetsy-updater-1.12.0
steamworks-0.9.0
frappe-0.4.7
tc-network-0.8.0
librice-0.0.2
wasmer-vm-near-2.4.0
xtor-0.9.10
async-acme-0.3.1
xiod-0.14.1
stack_test_epic_servers-3.0.3
broker-tokio-0.2.16
mugle_servers-5.2.0-alpha.5
fluence-fork-libp2p-0.36.2
examples-0.4.0
vulkan-malloc-0.1.5
tetsy-libp2p-0.34.3
tc-btree-0.7.0
rafx-resources-0.0.2
vapcore-1.12.1
syslog-rs-0.4.1
tet-libp2p-0.34.0
epic_servers-3.0.0
tokio-udt-0.1.0-alpha.8
grin_servers-5.1.2
crazyflie-lib-0.1.1
crs-bind-0.1.5
sgrankin-tacho-0.5.1
zee-0.3.2
agui_core-0.3.0
sc-network-0.9.0
libp2p-0.49.0
futures-executor-preview-0.3.0-alpha.19
nikidb-0.1.1
tacho-0.4.2
jujube-lib-0.1.1
remote-trait-object-0.5.0
ensync-1.0.1
webwire-0.4.0
graphsync-0.1.0
dependent_view-1.0.2
scrappy-client-0.0.1
requiem-wc-1.0.1-r1
tor-guardmgr-0.7.0
actoriwc-1.0.1
chamomile-0.8.0
awc-3.0.1
fluence-fork-libp2p-core-0.27.2
concurrency_traits-0.7.2
wayland-backend-0.1.0-beta.12
abel-0.1.1
mongodb_cwal-0.6.7
btleplug-0.10.1
everscale-network-0.3.12
tetsy-libp2p-core-0.27.1
carboxyl-0.2.1
bflog-0.3.1
incinerator-0.0.1
tet-libp2p-core-0.27.0
futures-executor-0.3.25
async_event_streams-0.1.4
wayland-server-0.30.0-beta.12
hakuban-0.6.1
sqlx-core-guts-0.6.0
local-pool-with-id-0.1.1
ipipe-0.11.7
bruteforus-0.1.0
single_executor-0.4.1
sqlx-core-0.6.2
indicatif-0.17.1
xiod_fakedata-0.4.1
prodash-21.0.0
fumio-reactor-0.1.0
hyper_wasi-0.15.0
dokan-0.2.0+dokan150
xtra-0.5.2
safe_cell_exts-0.3.0
polyhorn-core-0.4.0
uflow-0.6.1
hyper-0.14.20
cyfs-bdt-0.6.5
vex-rt-0.11.1
libutp-rs-1.0.0
observe-0.1.0
libp2p-helper-0.4.0
flash_rust_ws-0.4.1
hreq-h1-0.3.10
fluence-fork-libp2p-mplex-0.27.2
cell_rc-0.2.0
actix-web-4.2.1
dittolive-ditto-2.0.7
private-tx-1.0.0
tetsy-libp2p-mplex-0.27.2
shared_arena-0.8.4
sharedptr-0.3.4
fclones-0.29.1
appro-eq-0.3.1
hirofa_utils-0.5.5
ergo-rest-0.5.0
zenoh-collections-0.6.0-beta.1
drc-0.1.2
fluvio-wasm-timer-0.2.5
flatland-0.3.3
quickwit-metastore-0.3.0
mplex-0.27.0
futures-timer-3.0.2
rxrs-0.2.0-beta3
vrust-0.0.1
minimum-0.1.0
skywalking-0.4.0
wasm-timer-0.2.5
maia-0.1.1
kayrx-0.18.0
gw2lib-3.0.0-alpha-2
tokio_wasi-1.21.2
amethyst_assets-0.15.3
salvo_extra-0.37.1
nannou_wgpu-0.18.0
eventum-0.1.1
ydb-0.4.2
bottle-1.1.0-alpha
sardonyx_assets-0.0.3
vapcore-secretstore-1.0.0
fuel-p2p-0.11.2
tokio-sync-0.1.8
slack-morphism-1.3.2
phabricator-0.0.4
zbus-3.2.0
bleasy-0.2.2
verneuil-0.6.4
rdom-0.2.0
aspartam-0.1.0
tokio-1.21.2
sgx_tstd-1.1.1
delix-0.2.4
jojo-core-0.1.0
e-nguyen-0.1.2
parking_lot_mpsc-0.1.5
newport_gpu-0.2.0
vapcore-clique-0.1.0
madsim-0.2.8
rtrtr-0.2.2
ibc-relayer-cli-1.0.0
supernova-0.5.0
dipstick-0.9.0
healslut-0.1.0
hidamari-0.1.0
noders-0.0.2
elvis-core-0.1.1
libp2p-blake-streams-0.1.1
pipitor-0.3.0-alpha.15
bb8-0.8.0
tf_observer-0.1.2
futures-util-0.3.25
webrtc-ice-0.8.1
twitch-irc-5.0.0
lucet-runtime-internals-0.6.1
embedded-svc-0.22.1
arendur-0.0.5
dvcompute_branch-1.3.7
aptos-executor-0.2.7
netxclient-1.7.3
dvcompute_dist-1.3.7
dvcompute_cons-1.3.5
nash-native-client-0.3.0
git-branchless-lib-0.5.0
netmod-tcp-0.4.0
chronobreak-0.1.0
khronos-egl-4.1.0
tentacle-0.4.1
event-listener-primitives-2.0.1
timely-0.12.0
safe_core-0.43.1
dvcompute-1.3.4
bonsaidb-local-0.4.1
lettre-0.10.1
zenoh-transport-0.6.0-beta.1
metaplex-pulsar-4.1.1
madsim-tokio-postgres-0.2.0
futures-channel-0.3.25
dioxus-liveview-0.1.0
simple_futures-0.1.2
volo-thrift-0.2.0
rustable-0.3.0
zenoh-link-udp-0.6.0-beta.1
tracing-subscriber-0.3.16
crossfire-0.1.7
tokio-core-0.1.18
substrate-wasmtime-0.19.0
dynamic-pooling-1.0.0
ratchet_core-0.2.0
moka-0.9.4
ezk-sip-core-0.1.1
yukikaze-1.0.10
crymap-1.0.0
sn-pulsar-4.1.3
kayrx-timer-0.1.1
jack-0.10.0
cdbc-pg-0.1.22
scrappy-actor-0.0.1
mun_memory-0.2.0
cdbc-mysql-0.1.22
tokio-timer-0.2.13
tokio-postgres-0.7.7
tokio-tasker-1.2.0
sage_broker-0.3.0
requiem-framed-0.3.0-r1
rbdc-pg-0.1.19
linux-aio-tokio-0.3.0
tinychain-0.11.0
scrappy-framed-0.0.1
pulsar-4.1.3
rbdc-mysql-0.1.17
lock_api-0.4.9
ketos-0.12.0
actori-framed-0.3.0
keclc-framed-0.1.0
fumio-pool-0.1.0
actix-framed-0.3.1
threads_pool-0.2.6
dyn-cache-0.12.2
safina-executor-0.3.3
async-rustbus-0.1.2
udbg-0.2.1
requiem-0.9.0
nimiq-utils-0.2.0
ibc-relayer-0.19.0
asim-0.1.0
actori-0.9.0
potatonet-node-0.4.3
kpgres-0.5.0
zara-1.0.7
pkgcraft-0.0.2
fallacy-arc-0.1.1
ump-0.9.0
rcell-2.0.0-pre0
r2d2-0.8.10
nearly_eq-0.2.4
evc-0.1.2
finchers-0.13.5
daab-0.4.0
alto-3.0.4
futures-util-preview-0.3.0-alpha.19
ezsockets-0.3.0
webrtc-connection-0.2.0
ipfs-0.2.1
locutus-core-0.0.2
render_readme-0.7.5
abi_stable-0.10.4
libp2p-gossipsub-0.42.1
ckb-cli-1.1.1
linux-support-0.0.25
tokio-graceful-shutdown-0.11.1
minidsp-daemon-0.1.4
actix-raft-0.4.4
napi-2.10.0
tiny_http_sccache-0.7.0
txrx-0.1.0
gothack-future-parking_lot-0.3.4
tiny_http-0.12.0
async-liveliness-monitor-0.1.0
mdbook-pdf-headless_chrome-0.1.0
android-activity-0.4.0-beta.1
signal-stack-0.1.0
exonum-explorer-service-1.0.0
msr-core-0.3.5
sarekt-0.0.4
nakadion-0.30.0
janus_core-0.1.17
rouille-maint-in-3.0.1
future-parking_lot-0.3.3
casper-node-1.4.8
trillium-server-common-0.3.0
gotham-0.7.1
v-clickhouse-rs-0.2.0-alpha.7
streamunordered-0.5.2
interledger-store-redis-0.4.0
epoxy_streams-0.3.1
tetsy-hash-fetch-1.12.0
termusic-0.7.4
sophon-wasm-0.18.1
nobs-vulkanism-headless-0.1.0
cyfs-task-manager-0.6.0
cosync-0.2.1
watchexec-2.0.2
clickhouse-rs-1.0.0-alpha.1
rouille-ng-3.0.1
phoenix_channels_client-0.1.0
cozal-0.0.2
cosmian-wit-bindgen-rust-0.1.1
tokio-net-0.2.0-alpha.6
shredder-0.2.0
opentelemetry_sdk-0.18.0
anthill-service-system-1.2.3
ipfs-embed-0.24.0
heph-rt-0.4.1
wayland-cursor-0.30.0-beta.12
rouille-3.6.1
headless_chrome-0.9.0
conjure-runtime-4.0.0
tokio-threadpool-0.1.18
datafusion-13.0.0
amethyst_rendy-0.15.3
agner-reg-0.3.11
hey_listen-0.5.0
futures-locks-pre-0.5.1-pre
dwindow-0.3.0
ciruela-0.6.12
web-glitz-0.3.0
vmnet-0.1.1
virtiofsd-1.4.0
cachepot-0.1.0-rc.1
zerogc-context-0.2.0-alpha.7
tc-client-api-2.0.0
stack_test_epic_api-3.0.3
silver-rs-0.2.0-dev
sc-client-api-3.0.0
romio-0.3.0-alpha.10
akashi-0.5.2
yrs-warp-0.2.0
tubez-0.0.1
swindon-0.7.8
paxakos-0.12.0
async-web-server-0.6.2
async-task-ffi-4.1.1
vhdl_lang-0.17.0
vapcore-light-1.12.0
tube-0.0.2
rppal_w_frontend-0.0.4
reredis-0.1.0-alpha.2
mugle_api-5.2.0-alpha.5
kate-0.1.0
imdl-indicatif-0.14.0
eventuals-0.6.7
darksteel-0.1.0
bluest-0.5.3
abel-core-0.1.1
pi_async-0.5.5
netperf-0.2.3
moleculer-0.3.5
cargo-web-0.6.26
terminus-store-0.19.9
poem-1.3.47
output-0.6.2
futures-net-0.6.0
accesskit_windows-0.6.1
vrp-core-1.19.0
tracing-mutex-0.2.1
termwiz-0.18.0
syndicate-0.23.0
static_locks-0.1.0
salak_factory-0.10.0
roslibrust-0.5.1
lightproc-0.3.6-alpha.0
barrage-0.2.3
async-task-4.3.0
vapcore-sync-1.12.0
unmp-link-udp-0.7.0
unit-rs-0.2.0
rpi_embedded-0.1.0
rodio_wav_fix-0.15.0
joycon-rs-0.6.3
routing-0.37.1
madsim-etcd-client-0.2.8
futures-0.3.25
flo-state-1.1.0
crabquery-0.1.9
async-ucx-0.1.1
tokio-reactor-0.1.12
futures-rate-0.1.5
bandsocks-runtime-0.1.0
async-ws-0.3.3
validator-set-0.1.0
panorama-imap-0.0.4
libevent-0.1.0
gfx-backend-gl-0.9.0
cxx-async-0.1.1
sccache-0.3.0
reqwest_wasi-0.11.12
reqwest-wasm-0.11.15
nimiq-rpc-server-0.2.0
nails-0.13.0
jrsonnet-gcmodule-0.3.4
esp-idf-svc-0.42.5
dominator-0.5.31
byte_channel-0.0.1
aria2-ws-0.3.0
salvia-0.1.0
mz_rusoto_core-0.46.0
mwapi-0.4.1
grin_api-5.1.2
epic_api-3.0.0
douglas-0.1.1
vulkano-0.31.1
ticketed_lock-0.3.0
switchyard-0.3.0
rppal-0.13.1
azul-webrender-0.62.2
agner-actors-0.3.11
trade-0.1.0
pomfrit-0.1.9
gstreamer-0.18.8
dirinventory-0.7.0
asset_lru-0.1.3
stakker-0.2.5
rusoto_core-0.48.0
rodio-0.16.0
web_worker-0.3.0
vgtk-0.3.0
spin-0.9.4
spdlog-rs-0.2.4
reqwest-0.11.12
phabricator-mock-0.0.4
meli-0.7.2
gomoku-0.1.0
winit-modular-0.1.1
rsdb-0.12.1
curl-0.4.44
shared_lru-0.1.5
froggy-0.4.2
crayfish-0.0.1
blunt-0.0.8
xxlib-0.3.3
smartcalc-tui-1.0.8
minibus-0.0.2
wasmyon-0.1.1
hjul-0.2.2
stm-core-0.4.0
heng_rs-0.1.0
rw_lease-0.1.0

So, all crates whose performance might be increased by this PR should all be in this list. It'd be useful to find one or more crates in this list that heavily rely on Weak::upgrade in a hot path.

Another option would be to write some (realistic?) benchmarks that we can include in the standard library's test suite and/or the runtime benchmark suite of rustc-perf.

oskgo · 2024-08-18T16:28:52Z

Ping from triage:
@m-ou-se Are you working on the benchmarks? If not, maybe opening an issue could help increase visibility?

m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-atomic Area: Atomics, barriers, and sync primitives labels Sep 26, 2023

m-ou-se assigned Amanieu Sep 26, 2023

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 26, 2023