fix: Mark all packets TX'ed before PTO as lost #2129

larseggert · 2024-09-19T12:59:11Z

We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost).

This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not.

Broken out of #1998

We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost). This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not. Broken out of mozilla#1998

github-actions · 2024-09-19T13:25:26Z

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R Z B U L1 L2 C1 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
lsquic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

codecov · 2024-09-19T13:41:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.38%. Comparing base (c6d5502) to head (1e1bf7e).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2129   +/-   ##
=======================================
  Coverage   95.38%   95.38%           
=======================================
  Files         112      112           
  Lines       36593    36589    -4     
=======================================
- Hits        34903    34901    -2     
+ Misses       1690     1688    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-09-19T13:55:44Z

Benchmark results

Performance differences relative to c6d5502.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.

       time:   [99.837 ns 100.16 ns 100.49 ns]
       change: [+0.1224% +0.7628% +1.2932%] (p = 0.01 < 0.05)
Found 10 outliers among 100 measurements (10.00%)

10 (10.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.

       time:   [118.62 ns 118.85 ns 119.11 ns]
       change: [+0.8694% +1.2845% +1.6720%] (p = 0.00 < 0.05)
Found 17 outliers among 100 measurements (17.00%)

3 (3.00%) low severe

2 (2.00%) low mild

4 (4.00%) high mild

8 (8.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💔 Performance has regressed.

       time:   [118.47 ns 118.99 ns 119.60 ns]
       change: [+1.0188% +1.5767% +2.1513%] (p = 0.00 < 0.05)
Found 13 outliers among 100 measurements (13.00%)

2 (2.00%) low severe

2 (2.00%) low mild

9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.

       time:   [98.033 ns 98.181 ns 98.351 ns]
       change: [+0.3972% +1.2430% +2.1775%] (p = 0.00 < 0.05)
Found 10 outliers among 100 measurements (10.00%)

4 (4.00%) high mild

6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [111.77 ms 111.83 ms 111.88 ms]
       change: [+0.2804% +0.3515% +0.4198%] (p = 0.00 < 0.05)
Found 20 outliers among 100 measurements (20.00%)

1 (1.00%) low severe

8 (8.00%) low mild

11 (11.00%) high mild

SentPackets::take_ranges: No change in performance detected.

       time:   [5.5314 µs 5.6195 µs 5.7116 µs]
       change: [-1.7100% +1.2485% +4.2196%] (p = 0.42 > 0.05)
Found 4 outliers among 100 measurements (4.00%)

4 (4.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.

       time:   [26.592 ms 27.745 ms 28.884 ms]
       change: [-0.5101% +5.5311% +11.892%] (p = 0.08 > 0.05)

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [35.669 ms 37.496 ms 39.347 ms]
       change: [+2.2693% +9.1382% +16.971%] (p = 0.01 < 0.05)

transfer/pacing-false/same-seed: Change within noise threshold.

       time:   [26.328 ms 27.237 ms 28.158 ms]
       change: [+0.4598% +4.7572% +9.3134%] (p = 0.04 < 0.05)

transfer/pacing-true/same-seed: 💔 Performance has regressed.

       time:   [43.374 ms 45.990 ms 48.663 ms]
       change: [+3.6413% +10.894% +18.646%] (p = 0.01 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

2 (2.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.

       time:   [886.24 ms 895.56 ms 905.00 ms]
       thrpt:  [110.50 MiB/s 111.66 MiB/s 112.84 MiB/s]
change:
       time:   [-2.5595% -1.0431% +0.5240%] (p = 0.18 > 0.05)
       thrpt:  [-0.5213% +1.0540% +2.6267%]

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.

       time:   [315.46 ms 318.70 ms 322.06 ms]
       thrpt:  [31.050 Kelem/s 31.377 Kelem/s 31.700 Kelem/s]
change:
       time:   [-3.2876% -1.9212% -0.4317%] (p = 0.01 < 0.05)
       thrpt:  [+0.4336% +1.9588% +3.3994%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [33.620 ms 33.764 ms 33.917 ms]
       thrpt:  [29.484  elem/s 29.618  elem/s 29.744  elem/s]
change:
       time:   [-0.8370% -0.0531% +0.7303%] (p = 0.89 > 0.05)
       thrpt:  [-0.7250% +0.0531% +0.8441%]
Found 9 outliers among 100 measurements (9.00%)

5 (5.00%) low mild

2 (2.00%) high mild

2 (2.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.

       time:   [1.6418 s 1.6622 s 1.6832 s]
       thrpt:  [59.411 MiB/s 60.161 MiB/s 60.911 MiB/s]
change:
       time:   [-1.9410% -0.0386% +1.7919%] (p = 0.97 > 0.05)
       thrpt:  [-1.7604% +0.0386% +1.9794%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	MTU	Mean [ms]	Min [ms]	Max [ms]
gquiche	gquiche			1504	584.2 ± 92.0	525.5	764.4
neqo	gquiche	reno	on	1504	802.2 ± 62.1	757.4	929.4
neqo	gquiche	reno		1504	795.5 ± 50.2	747.8	919.8
neqo	gquiche	cubic	on	1504	816.2 ± 56.4	765.2	962.8
neqo	gquiche	cubic		1504	801.6 ± 74.6	760.4	1006.8
msquic	msquic			1504	170.6 ± 98.5	98.3	364.9
neqo	msquic	reno	on	1504	245.4 ± 56.5	210.0	406.9
neqo	msquic	reno		1504	287.6 ± 94.6	215.4	453.0
neqo	msquic	cubic	on	1504	248.3 ± 67.8	212.5	415.6
neqo	msquic	cubic		1504	282.4 ± 91.4	210.2	437.9
gquiche	neqo	reno	on	1504	689.1 ± 91.9	549.4	819.6
gquiche	neqo	reno		1504	713.2 ± 115.0	560.9	936.0
gquiche	neqo	cubic	on	1504	703.3 ± 141.2	551.2	1031.5
gquiche	neqo	cubic		1504	708.1 ± 133.1	559.4	1024.7
msquic	neqo	reno	on	1504	473.7 ± 12.5	455.2	488.2
msquic	neqo	reno		1504	526.4 ± 100.1	456.1	681.6
msquic	neqo	cubic	on	1504	497.9 ± 37.5	476.9	598.1
msquic	neqo	cubic		1504	537.1 ± 74.8	472.6	652.4
neqo	neqo	reno	on	1504	525.6 ± 52.1	447.9	629.5
neqo	neqo	reno		1504	504.0 ± 65.2	447.5	674.6
neqo	neqo	cubic	on	1504	543.3 ± 52.7	490.7	677.0
neqo	neqo	cubic		1504	548.1 ± 37.8	483.7	597.1

⬇️ Download logs

github-actions · 2024-09-19T15:25:40Z

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: Debug Release
macOS: Debug Release
Windows: Debug Release

larseggert · 2024-09-20T07:29:53Z

@martinthomson I'd appreciate a review, since the code I am touching is pretty complex.

mxinden

This makes sense to me. Thanks for extracting it into a smaller pull request.

I am in favor of waiting for Martin's review.

martinthomson

Do we not have tests for this? Should we?

martinthomson · 2024-09-25T14:49:28Z

neqo-transport/src/recovery/mod.rs

-                            .pto_packets(PtoState::pto_packet_count(*pn_space))
-                            .cloned(),
-                    );
+                    lost.extend(space.pto_packets().cloned());


Do we still need pto_packet_count if this is the decision?

The other question I have is whether this is necessary. We're cloning all of the information so that we can process the loss, which means more work on a PTO. Maybe PTO is rare enough that this doesn't matter, but one of the reasons for the limit on number was to avoid the extra work.

Do we still need pto_packet_count if this is the decision?

We do still need it to limit the number of packets we send on PTO.

The other question I have is whether this is necessary. We're cloning all of the information so that we can process the loss, which means more work on a PTO. Maybe PTO is rare enough that this doesn't matter, but one of the reasons for the limit on number was to avoid the extra work.

I've been wondering if it would be sufficient to mark n packets per space as lost, instead of all.

larseggert · 2024-09-26T10:19:44Z

Do we not have tests for this? Should we?

There are tests in #2128, but this PR alone doesn't make them succeed yet.

Signed-off-by: Lars Eggert <lars@eggert.org>

larseggert requested review from KershawChang, martinthomson and mxinden as code owners September 19, 2024 12:59

mxinden approved these changes Sep 25, 2024

View reviewed changes

martinthomson reviewed Sep 25, 2024

View reviewed changes

Merge branch 'main' into fix-pto-all

69bb7f8

larseggert mentioned this pull request Sep 27, 2024

make process_output be able to return keep_alive timeout #2136

Merged

larseggert added 3 commits November 25, 2024 09:00

Merge branch 'main' into fix-pto-all

a65a90b

Signed-off-by: Lars Eggert <lars@eggert.org>

Fix merge

4e1b379

Merge branch 'main' into fix-pto-all

1e1bf7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Mark all packets TX'ed before PTO as lost #2129

fix: Mark all packets TX'ed before PTO as lost #2129

larseggert commented Sep 19, 2024

github-actions bot commented Sep 19, 2024 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

codecov bot commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024

larseggert commented Sep 20, 2024

mxinden left a comment

martinthomson left a comment

martinthomson Sep 25, 2024

larseggert Sep 26, 2024

larseggert commented Sep 26, 2024

fix: Mark all packets TX'ed before PTO as lost #2129

Are you sure you want to change the base?

fix: Mark all packets TX'ed before PTO as lost #2129

Conversation

larseggert commented Sep 19, 2024

github-actions bot commented Sep 19, 2024 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

github-actions bot commented Sep 19, 2024 • edited Loading

Benchmark results

Client/server transfer results

github-actions bot commented Sep 19, 2024

Firefox builds for this PR

larseggert commented Sep 20, 2024

mxinden left a comment

Choose a reason for hiding this comment

martinthomson left a comment

Choose a reason for hiding this comment

martinthomson Sep 25, 2024

Choose a reason for hiding this comment

larseggert Sep 26, 2024

Choose a reason for hiding this comment

larseggert commented Sep 26, 2024

github-actions bot commented Sep 19, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading