Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Mark all packets TX'ed before PTO as lost #2129

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost).

This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not.

Broken out of #1998

We'd previously only mark 1 one or two packets as lost when a PTO fired.
That meant that we potentially didn't RTX all data that we could have
(i.e., that was in lost packets that we didn't mark lost).

This also changes the probing code to suppress redundant keep-alives,
i.e., PINGs that we sent for other reasons, which could double as
keep-alives but did not.

Broken out of mozilla#1998
Copy link

github-actions bot commented Sep 19, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.38%. Comparing base (c6d5502) to head (1e1bf7e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2129   +/-   ##
=======================================
  Coverage   95.38%   95.38%           
=======================================
  Files         112      112           
  Lines       36593    36589    -4     
=======================================
- Hits        34903    34901    -2     
+ Misses       1690     1688    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Sep 19, 2024

Benchmark results

Performance differences relative to c6d5502.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [99.837 ns 100.16 ns 100.49 ns]
       change: [+0.1224% +0.7628% +1.2932%] (p = 0.01 < 0.05)

Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.
       time:   [118.62 ns 118.85 ns 119.11 ns]
       change: [+0.8694% +1.2845% +1.6720%] (p = 0.00 < 0.05)

Found 17 outliers among 100 measurements (17.00%)
3 (3.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
8 (8.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💔 Performance has regressed.
       time:   [118.47 ns 118.99 ns 119.60 ns]
       change: [+1.0188% +1.5767% +2.1513%] (p = 0.00 < 0.05)

Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
2 (2.00%) low mild
9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.
       time:   [98.033 ns 98.181 ns 98.351 ns]
       change: [+0.3972% +1.2430% +2.1775%] (p = 0.00 < 0.05)

Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.77 ms 111.83 ms 111.88 ms]
       change: [+0.2804% +0.3515% +0.4198%] (p = 0.00 < 0.05)

Found 20 outliers among 100 measurements (20.00%)
1 (1.00%) low severe
8 (8.00%) low mild
11 (11.00%) high mild

SentPackets::take_ranges: No change in performance detected.
       time:   [5.5314 µs 5.6195 µs 5.7116 µs]
       change: [-1.7100% +1.2485% +4.2196%] (p = 0.42 > 0.05)

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [26.592 ms 27.745 ms 28.884 ms]
       change: [-0.5101% +5.5311% +11.892%] (p = 0.08 > 0.05)
transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [35.669 ms 37.496 ms 39.347 ms]
       change: [+2.2693% +9.1382% +16.971%] (p = 0.01 < 0.05)
transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [26.328 ms 27.237 ms 28.158 ms]
       change: [+0.4598% +4.7572% +9.3134%] (p = 0.04 < 0.05)
transfer/pacing-true/same-seed: 💔 Performance has regressed.
       time:   [43.374 ms 45.990 ms 48.663 ms]
       change: [+3.6413% +10.894% +18.646%] (p = 0.01 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [886.24 ms 895.56 ms 905.00 ms]
       thrpt:  [110.50 MiB/s 111.66 MiB/s 112.84 MiB/s]
change:
       time:   [-2.5595% -1.0431% +0.5240%] (p = 0.18 > 0.05)
       thrpt:  [-0.5213% +1.0540% +2.6267%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.
       time:   [315.46 ms 318.70 ms 322.06 ms]
       thrpt:  [31.050 Kelem/s 31.377 Kelem/s 31.700 Kelem/s]
change:
       time:   [-3.2876% -1.9212% -0.4317%] (p = 0.01 < 0.05)
       thrpt:  [+0.4336% +1.9588% +3.3994%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [33.620 ms 33.764 ms 33.917 ms]
       thrpt:  [29.484  elem/s 29.618  elem/s 29.744  elem/s]
change:
       time:   [-0.8370% -0.0531% +0.7303%] (p = 0.89 > 0.05)
       thrpt:  [-0.7250% +0.0531% +0.8441%]

Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [1.6418 s 1.6622 s 1.6832 s]
       thrpt:  [59.411 MiB/s 60.161 MiB/s 60.911 MiB/s]
change:
       time:   [-1.9410% -0.0386% +1.7919%] (p = 0.97 > 0.05)
       thrpt:  [-1.7604% +0.0386% +1.9794%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 584.2 ± 92.0 525.5 764.4
neqo gquiche reno on 1504 802.2 ± 62.1 757.4 929.4
neqo gquiche reno 1504 795.5 ± 50.2 747.8 919.8
neqo gquiche cubic on 1504 816.2 ± 56.4 765.2 962.8
neqo gquiche cubic 1504 801.6 ± 74.6 760.4 1006.8
msquic msquic 1504 170.6 ± 98.5 98.3 364.9
neqo msquic reno on 1504 245.4 ± 56.5 210.0 406.9
neqo msquic reno 1504 287.6 ± 94.6 215.4 453.0
neqo msquic cubic on 1504 248.3 ± 67.8 212.5 415.6
neqo msquic cubic 1504 282.4 ± 91.4 210.2 437.9
gquiche neqo reno on 1504 689.1 ± 91.9 549.4 819.6
gquiche neqo reno 1504 713.2 ± 115.0 560.9 936.0
gquiche neqo cubic on 1504 703.3 ± 141.2 551.2 1031.5
gquiche neqo cubic 1504 708.1 ± 133.1 559.4 1024.7
msquic neqo reno on 1504 473.7 ± 12.5 455.2 488.2
msquic neqo reno 1504 526.4 ± 100.1 456.1 681.6
msquic neqo cubic on 1504 497.9 ± 37.5 476.9 598.1
msquic neqo cubic 1504 537.1 ± 74.8 472.6 652.4
neqo neqo reno on 1504 525.6 ± 52.1 447.9 629.5
neqo neqo reno 1504 504.0 ± 65.2 447.5 674.6
neqo neqo cubic on 1504 543.3 ± 52.7 490.7 677.0
neqo neqo cubic 1504 548.1 ± 37.8 483.7 597.1

⬇️ Download logs

Copy link

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@larseggert
Copy link
Collaborator Author

@martinthomson I'd appreciate a review, since the code I am touching is pretty complex.

Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Thanks for extracting it into a smaller pull request.

I am in favor of waiting for Martin's review.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have tests for this? Should we?

.pto_packets(PtoState::pto_packet_count(*pn_space))
.cloned(),
);
lost.extend(space.pto_packets().cloned());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need pto_packet_count if this is the decision?

The other question I have is whether this is necessary. We're cloning all of the information so that we can process the loss, which means more work on a PTO. Maybe PTO is rare enough that this doesn't matter, but one of the reasons for the limit on number was to avoid the extra work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need pto_packet_count if this is the decision?

We do still need it to limit the number of packets we send on PTO.

The other question I have is whether this is necessary. We're cloning all of the information so that we can process the loss, which means more work on a PTO. Maybe PTO is rare enough that this doesn't matter, but one of the reasons for the limit on number was to avoid the extra work.

I've been wondering if it would be sufficient to mark n packets per space as lost, instead of all.

@larseggert
Copy link
Collaborator Author

Do we not have tests for this? Should we?

There are tests in #2128, but this PR alone doesn't make them succeed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants