Track sent items using SmallVec #1657

martinthomson · 2024-02-15T05:02:18Z

DO NOT MERGE THIS - IT MAKES THINGS SLOWER

I thought that this was worth sharing though.

I was looking at the profiles that @KershawChang shared and noticed that write_stream_frame was eating a decent amount of CPU because there was an allocation in there. It took me a while to track it down, but it turns out that this is the first place that we add a RecoveryToken to the Vec of those that we hold in each SentPacket.

The obvious thing to test in this case is SmallVec. SmallVec is designed for places where you mostly add a few items to a collection. It holds a small array into which values are deposited unless more than the array size is added, when those values are allocated on the heap, just like a normal Vec. This means that if you stay within the array, you avoid a heap allocation, which can save a bunch of work. The cost is a little complexity, the extra stack allocation, and that your stuff needs to move if you ever overflow the backing array. So if your prediction is wrong, it is more expensive.

I tried this out and it makes performance worse. With a backing array of size 1, the performance change isn't observable. But larger values progressively degrade performance. Increasing the array size to 8 makes it worse than the value of 4 I started with. My theory is that the increased size of the base object starts is the problem. This collection is typically kept in a SentPacket. Our acknowledgements code needs extra time to move the larger struct around.

The size of the performance regression suggests that there are some ways we could further improve how SentPacket is handled.

Benchmark results

Run multiple transfers with varying seeds
                        time:   [42.395 ms 42.547 ms 42.712 ms]
                        change: [+3.3754% +4.3870% +5.2776%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Run multiple transfers with the same seed
                        time:   [42.513 ms 42.584 ms 42.657 ms]
                        change: [+3.4443% +3.7777% +4.0976%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

1. Use NSS Release 2. Add workflow_dispatch 3. Move some env values up in the file

codecov-commenter · 2024-02-16T07:14:58Z

Codecov Report

Attention: Patch coverage is 98.96907% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.39%. Comparing base (a1b9364) to head (f8e2c98).

Files with missing lines	Patch %	Lines
neqo-transport/src/send_stream.rs	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1657      +/-   ##
==========================================
- Coverage   95.40%   95.39%   -0.01%     
==========================================
  Files         113      113              
  Lines       36721    36717       -4     
==========================================
- Hits        35032    35028       -4     
  Misses       1689     1689

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Lars Eggert <lars@eggert.org> Signed-off-by: Martin Thomson <mt@lowentropy.net>

martinthomson · 2024-02-19T12:04:12Z

@larseggert, I'm looking at the profiles that were generated here and it looks like these are off the main branch. The flamegraph for transfer shows calls into BTreeMap, but I don't see any way that could happen.

larseggert · 2024-02-19T12:12:55Z

I can't get to it this week. Wonder if the checkout action needs another parameter?

Signed-off-by: Lars Eggert <lars@eggert.org>

larseggert · 2024-03-19T06:04:01Z

Let's see what the new flamegraphs show (also because LTO is now enabled.)

github-actions · 2024-03-25T13:25:12Z

Benchmark results

Performance differences relative to 76630a5.

drain a timer quickly time: [395.29 ns 402.55 ns 409.33 ns]
change: [-0.6787% +0.9859% +2.7112%] (p = 0.26 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 1+1 entries
time: [193.97 ns 194.36 ns 194.79 ns]
change: [-0.5237% -0.1538% +0.2215%] (p = 0.41 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 3+1 entries
time: [235.93 ns 236.55 ns 237.18 ns]
change: [-0.8063% -0.4103% -0.0496%] (p = 0.04 < 0.05)
Change within noise threshold.
coalesce_acked_from_zero 10+1 entries
time: [235.78 ns 236.59 ns 237.55 ns]
change: [-0.4030% +3.6386% +13.911%] (p = 0.42 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 1000+1 entries
time: [217.27 ns 222.01 ns 232.88 ns]
change: [+1.4742% +5.6444% +13.185%] (p = 0.06 > 0.05)
No change in performance detected.
RxStreamOrderer::inbound_frame()
time: [117.57 ms 117.65 ms 117.73 ms]
change: [+0.0927% +0.1982% +0.2949%] (p = 0.00 < 0.05)
Change within noise threshold.
transfer/Run multiple transfers with varying seeds
time: [124.35 ms 124.61 ms 124.88 ms]
thrpt: [32.031 MiB/s 32.099 MiB/s 32.169 MiB/s]
change:
time: [+5.4576% +5.7824% +6.1037%] (p = 0.00 < 0.05)
thrpt: [-5.7526% -5.4664% -5.1751%]
💔 Performance has regressed.
transfer/Run multiple transfers with the same seed
time: [124.82 ms 125.00 ms 125.18 ms]
thrpt: [31.953 MiB/s 32.000 MiB/s 32.047 MiB/s]
change:
time: [+5.2953% +5.5022% +5.6981%] (p = 0.00 < 0.05)
thrpt: [-5.3909% -5.2152% -5.0290%]
💔 Performance has regressed.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			428.3 ± 58.8	359.7	538.1	1.00
neqo	msquic	reno	on	1921.6 ± 35.0	1850.3	1960.9	1.00
neqo	msquic	reno		1917.7 ± 30.3	1857.9	1959.7	1.00
neqo	msquic	cubic	on	1898.8 ± 56.6	1775.9	1971.2	1.00
neqo	msquic	cubic		1901.9 ± 46.7	1811.5	1960.3	1.00
msquic	neqo	reno	on	4418.4 ± 170.0	4222.4	4692.8	1.00
msquic	neqo	reno		4510.8 ± 232.7	4225.0	4969.3	1.00
msquic	neqo	cubic	on	4475.1 ± 154.6	4259.7	4672.4	1.00
msquic	neqo	cubic		4434.9 ± 168.7	4204.7	4705.9	1.00
neqo	neqo	reno	on	3664.9 ± 172.2	3387.1	3957.8	1.00
neqo	neqo	reno		3805.0 ± 275.7	3529.5	4374.1	1.00
neqo	neqo	cubic	on	4329.7 ± 141.6	4114.5	4523.6	1.00
neqo	neqo	cubic		4372.4 ± 141.9	4184.3	4553.5	1.00

⬇️ Download logs

larseggert · 2024-12-10T16:00:41Z

It might be worthwhile revisiting this now that @mxinden made some improvements related to SentPacket.

martinthomson · 2024-12-11T04:55:15Z

It still looks pretty bad

RxStreamOrderer::inbound_frame()
                        time:   [51.698 ms 51.889 ms 52.085 ms]
                        change: [+4.3290% +5.1052% +5.9211%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

     Running benches/sent_packets.rs (target/release/deps/sent_packets-a9a340a4dfd0be33)
SentPackets::take_ranges
                        time:   [12.966 µs 13.264 µs 13.516 µs]
                        change: [+226.83% +258.23% +293.90%] (p = 0.00 < 0.05)
                        Performance has regressed.

     Running benches/transfer.rs (target/release/deps/transfer-8555e2732eb99220)
transfer/pacing-false/varying-seeds
                        time:   [14.021 ms 14.244 ms 14.462 ms]
                        change: [+5.1022% +7.4030% +9.5750%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild

transfer/pacing-true/varying-seeds
                        time:   [14.828 ms 15.058 ms 15.281 ms]
                        change: [+4.0012% +6.3270% +8.5372%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

transfer/pacing-false/same-seed
                        time:   [12.132 ms 12.391 ms 12.643 ms]
                        change: [+4.6848% +7.8215% +10.869%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

transfer/pacing-true/same-seed
                        time:   [13.924 ms 14.178 ms 14.432 ms]
                        change: [+4.0431% +6.9063% +9.9521%] (p = 0.00 < 0.05)
                        Performance has regressed.

github-actions · 2024-12-11T05:07:53Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 3e65261.

neqo-latest as client

neqo-latest as server

lsquic vs. neqo-latest: run cancelled after 20 min
msquic vs. neqo-latest: Z 🚀U
mvfst vs. neqo-latest: Z A L1 C1
quic-go vs. neqo-latest: run cancelled after 20 min
quinn vs. neqo-latest: V2
xquic vs. neqo-latest: M

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 🚀Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A 🚀L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R Z B U L1 L2 ⚠️C1 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L1 L2 ⚠️C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 ⚠️C1 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R B ⚠️U A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
ngtcp2 vs. neqo-latest: H DC LR ⚠️C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

martinthomson and others added 5 commits February 15, 2024 15:42

Use a smallvec for loss recovery tokens

0bac724

A small value might be better

f06dc6a

8 is definitely worse

adfa95f

Benchmark tweaks

085989d

1. Use NSS Release 2. Add workflow_dispatch 3. Move some env values up in the file

Merge branch 'main' into smallvec-tokens

7c59c9e

martinthomson and others added 2 commits February 19, 2024 22:15

YAML corrections

43d1b7f

Co-authored-by: Lars Eggert <lars@eggert.org> Signed-off-by: Martin Thomson <mt@lowentropy.net>

Build NSS optimized

9134871

martinthomson and others added 2 commits February 20, 2024 11:15

Merge branch 'bench-dispatch' into smallvec-tokens

809926f

Merge branch 'main' into smallvec-tokens

ad0e705

Signed-off-by: Lars Eggert <lars@eggert.org>

larseggert added 2 commits March 25, 2024 14:07

Fix

e05f66d

Merge branch 'main' into smallvec-tokens

167816c

martinthomson added 4 commits December 11, 2024 15:37

Merge main manually.

6352d71

Merge branch 'main' into smallvec-tokens

1ff3bad

Merge remote-tracking branch 'mine/smallvec-tokens' into smallvec-tokens

dae481d

Merge branch 'main' into smallvec-tokens

f8e2c98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track sent items using SmallVec #1657

Track sent items using SmallVec #1657

martinthomson commented Feb 15, 2024

codecov-commenter commented Feb 16, 2024 •

edited by codecov bot

Loading

martinthomson commented Feb 19, 2024

larseggert commented Feb 19, 2024

larseggert commented Mar 19, 2024

github-actions bot commented Mar 25, 2024

larseggert commented Dec 10, 2024

martinthomson commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

Track sent items using SmallVec #1657

Are you sure you want to change the base?

Track sent items using SmallVec #1657

Conversation

martinthomson commented Feb 15, 2024

codecov-commenter commented Feb 16, 2024 • edited by codecov bot Loading

Codecov Report

martinthomson commented Feb 19, 2024

larseggert commented Feb 19, 2024

larseggert commented Mar 19, 2024

github-actions bot commented Mar 25, 2024

Benchmark results

Client/server transfer results

larseggert commented Dec 10, 2024

martinthomson commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

codecov-commenter commented Feb 16, 2024 •

edited by codecov bot

Loading