Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Stop the PMTUD search at the interface MTU #2135

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU, and only start from 1280 when that fails?

WIP

Should we optimistically *start* the search at the interface MTU, and
only start from 1280 when that fails?
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.40%. Comparing base (2bc5ffc) to head (988459c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2135      +/-   ##
==========================================
+ Coverage   95.39%   95.40%   +0.01%     
==========================================
  Files         113      113              
  Lines       36683    36701      +18     
==========================================
+ Hits        34994    35015      +21     
+ Misses       1689     1686       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@larseggert larseggert marked this pull request as ready for review September 26, 2024 13:08
@mxinden
Copy link
Collaborator

mxinden commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

Copy link

github-actions bot commented Sep 26, 2024

Failed Interop Tests

None ❓

All results

Succeeded Interop Tests

None ❓

Unsupported Interop Tests

None ❓

@larseggert
Copy link
Collaborator Author

All true, but in practice, the local interface is most often the limiting hop.

Copy link

github-actions bot commented Sep 26, 2024

Benchmark results

Performance differences relative to c6d5502.

coalesce_acked_from_zero 1+1 entries: 💔 Performance has regressed.
       time:   [105.57 ns 105.91 ns 106.29 ns]
       change: [+5.8389% +6.4848% +7.0193%] (p = 0.00 < 0.05)

Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💔 Performance has regressed.
       time:   [121.44 ns 121.85 ns 122.31 ns]
       change: [+3.1785% +3.5907% +3.9881%] (p = 0.00 < 0.05)

Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low mild
3 (3.00%) high mild
9 (9.00%) high severe

coalesce_acked_from_zero 10+1 entries: 💔 Performance has regressed.
       time:   [121.10 ns 121.55 ns 122.10 ns]
       change: [+3.0928% +3.5527% +3.9751%] (p = 0.00 < 0.05)

Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: 💔 Performance has regressed.
       time:   [100.56 ns 100.70 ns 100.87 ns]
       change: [+2.8990% +3.7252% +4.5719%] (p = 0.00 < 0.05)

Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.75 ms 111.81 ms 111.86 ms]
       change: [+0.2679% +0.3351% +0.4061%] (p = 0.00 < 0.05)

Found 18 outliers among 100 measurements (18.00%)
9 (9.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [5.4807 µs 5.6706 µs 5.8770 µs]
       change: [-3.4151% -0.0737% +3.2172%] (p = 0.96 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe

transfer/pacing-false/varying-seeds: 💔 Performance has regressed.
       time:   [77.594 ms 77.799 ms 78.003 ms]
       change: [+183.46% +195.92% +209.17%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

transfer/pacing-true/varying-seeds: 💔 Performance has regressed.
       time:   [78.075 ms 78.250 ms 78.423 ms]
       change: [+117.36% +127.76% +139.21%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed: 💔 Performance has regressed.
       time:   [78.252 ms 78.392 ms 78.534 ms]
       change: [+193.17% +201.51% +210.20%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/same-seed: 💔 Performance has regressed.
       time:   [78.645 ms 78.787 ms 78.925 ms]
       change: [+81.903% +89.975% +98.836%] (p = 0.00 < 0.05)

Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low mild

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [907.88 ms 917.99 ms 928.39 ms]
       thrpt:  [107.71 MiB/s 108.93 MiB/s 110.15 MiB/s]
change:
       time:   [-0.1489% +1.4351% +3.0836%] (p = 0.08 > 0.05)
       thrpt:  [-2.9914% -1.4148% +0.1491%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.
       time:   [300.15 ms 302.04 ms 303.94 ms]
       thrpt:  [32.901 Kelem/s 33.109 Kelem/s 33.316 Kelem/s]
change:
       time:   [-8.1101% -7.0497% -5.9887%] (p = 0.00 < 0.05)
       thrpt:  [+6.3702% +7.5844% +8.8259%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.
       time:   [34.366 ms 34.561 ms 34.777 ms]
       thrpt:  [28.755  elem/s 28.934  elem/s 29.099  elem/s]
change:
       time:   [+1.3683% +2.3085% +3.2482%] (p = 0.00 < 0.05)
       thrpt:  [-3.1460% -2.2564% -1.3498%]

Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.
       time:   [1.6966 s 1.7138 s 1.7312 s]
       thrpt:  [57.763 MiB/s 58.351 MiB/s 58.940 MiB/s]
change:
       time:   [+1.3262% +3.0626% +4.8632%] (p = 0.00 < 0.05)
       thrpt:  [-4.6376% -2.9716% -1.3089%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 591.1 ± 95.3 513.8 765.5
neqo gquiche reno on 1504 810.7 ± 69.4 761.0 1001.6
neqo gquiche reno 1504 834.9 ± 74.8 787.0 980.3
neqo gquiche cubic on 1504 832.6 ± 96.1 773.1 1074.4
neqo gquiche cubic 1504 805.8 ± 25.4 782.8 851.2
msquic msquic 1504 189.7 ± 102.5 102.1 373.3
neqo msquic reno on 1504 260.6 ± 100.0 202.8 482.4
neqo msquic reno 1504 216.8 ± 14.1 199.0 245.5
neqo msquic cubic on 1504 211.5 ± 10.7 199.9 230.3
neqo msquic cubic 1504 217.4 ± 11.2 200.8 229.7
gquiche neqo reno on 1504 711.5 ± 95.3 587.1 900.3
gquiche neqo reno 1504 699.3 ± 87.0 570.7 810.4
gquiche neqo cubic on 1504 726.0 ± 154.3 584.7 1059.7
gquiche neqo cubic 1504 706.4 ± 110.2 572.3 917.6
msquic neqo reno on 1504 566.0 ± 120.2 491.5 884.6
msquic neqo reno 1504 510.6 ± 38.3 483.2 616.5
msquic neqo cubic on 1504 535.2 ± 94.2 470.5 721.1
msquic neqo cubic 1504 496.4 ± 34.2 469.4 589.0
neqo neqo reno on 1504 587.6 ± 47.9 507.7 686.1
neqo neqo reno 1504 521.7 ± 30.5 483.8 570.1
neqo neqo cubic on 1504 580.3 ± 82.1 503.4 758.1
neqo neqo cubic 1504 564.4 ± 53.3 490.5 703.3

⬇️ Download logs

@larseggert
Copy link
Collaborator Author

This PR exposed a bug in mtu 🫤 mozilla/mtu#26

@mxinden
Copy link
Collaborator

mxinden commented Sep 26, 2024

Should we also optimistically start the search at the interface MTU

Are there other projects using this optimistic approach?

If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.

The MAX_PLPMTU is the largest size of PLPMTU. This has to be less than or equal to the maximum size of the PL packet that can be sent on the outgoing interface (constrained by the local interface MTU).

https://www.rfc-editor.org/rfc/rfc8899.html#section-5.1.2

All true, but in practice, the local interface is most often the limiting hop.

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.

We only start probing once the connection is confirmed.

fn set_confirmed(&mut self) -> Res<()> {
self.set_state(State::Confirmed);
if self.conn_params.pmtud_enabled() {
self.paths
.primary()
.ok_or(Error::InternalError)?
.borrow_mut()
.pmtud_mut()
.start();

Say that a client's path MTU is smaller than their local interface MTU. Given that probing only starts once confirmed, i.e. after receiving HANDSHAKE_DONE from the server, initial HTTP requests would not be delayed, but only one subsequent flight of requests would be delayed by up to one PTO. Is that correct?

Thus this optimization, and really all of PMTUD probing, assumes that the potential delay of one subsequent flight of HTTP requests by up to one PTO is worth the trade off of potentially increasing the overall connection throughput.

Is that correct?

@larseggert
Copy link
Collaborator Author

Should we also optimistically start the search at the interface MTU

Let me make sure I understand the implications here correctly. Sorry for any potential mistakes.
We only start probing once the connection is confirmed.

This would need to change. What I think we should do is roughly this:

  • Start sending at the local interface MTU when the connection starts (i.e., for the Initial).
  • If we detect a loss n times, we revert to pobing up from 1280.

n should probably be something like 2, so we don't cause undue delay.

@mxinden
Copy link
Collaborator

mxinden commented Sep 27, 2024

In the case where a client's path MTU is smaller than their local interface MTU, this would add a delay of 2*PTO to every connection establishment, right? If so, isn't that a high cost for the potential increase in throughput? Or is such scenario just very rare?

@larseggert
Copy link
Collaborator Author

larseggert commented Sep 27, 2024

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

We could also cache a probed MTU towards a destination IP, like the OS does for a TCP MSS it has determined.

@mxinden
Copy link
Collaborator

mxinden commented Sep 27, 2024

Yes. I think this is a rare case, but maybe we add some telemetry first to confirm?

How about we:

  1. Role out PMTUD on Firefox Nightly without the optimization (i.e. starting at the local interface MTU).
  2. Enable the optimization, monitoring whether connection establishment times stay stable.

@larseggert
Copy link
Collaborator Author

I was thinking we just log the log the local interface MTU together with the discovered PMTUD, and check for differences.

Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good to me. Minor comments.

neqo-transport/Cargo.toml Outdated Show resolved Hide resolved
neqo-transport/src/cc/classic_cc.rs Show resolved Hide resolved
neqo-transport/src/path.rs Outdated Show resolved Hide resolved
neqo-transport/src/path.rs Outdated Show resolved Hide resolved
neqo-transport/src/path.rs Outdated Show resolved Hide resolved
@larseggert larseggert marked this pull request as draft October 8, 2024 05:51
@larseggert larseggert marked this pull request as ready for review October 8, 2024 12:47
@larseggert larseggert marked this pull request as draft October 8, 2024 12:48
@larseggert larseggert marked this pull request as ready for review October 8, 2024 13:10
@larseggert
Copy link
Collaborator Author

larseggert commented Nov 25, 2024

We need to bump the rayon crate in mozilla-central to 1.7, which is what windows-bindgen requires, before we can land this PR and bring in mtu as a dependency.

(See https://github.com/larseggert/neqo/actions/runs/11972576345/job/33379768209#step:8:86 for Firefox build failure.)

@mxinden @valenting @KershawChang any tips on how to do this?

@mxinden
Copy link
Collaborator

mxinden commented Nov 25, 2024

We will need to update the following crates:

➜  cargo tree --invert=rayon
rayon v1.6.1
├── style v0.0.1
│   ├── geckoservo v0.0.1
│   │   └── gkrust-shared v0.1.0
│   │       ├── gkrust v0.1.0
│   │       └── gkrust-gtest v0.1.0
│   └── stylo_tests v0.0.1
│       [dev-dependencies]
│       └── gkrust v0.1.0
├── webrender v0.62.0
│   └── webrender_bindings v0.1.0
│       └── gkrust-shared v0.1.0
├── webrender_bindings v0.1.0
└── wr_glyph_rasterizer v0.1.0
    └── webrender v0.62.0

Note that, given rayon 1.7 is a minor release only, i.e. non-breaking, we might get away with a subset.

For the record, windows-bindgen introduced rayon v1.7 dependency in windows-bindgen v0.49.0. If it would have been a more recent version, it might have been worth exploring supporting both the most recent and the one without rayon, similar to what we did in https://github.com/quinn-rs/quinn/actions/runs/11553182968/job/32153809202?pr=2021.

@mxinden
Copy link
Collaborator

mxinden commented Nov 25, 2024

➜  mozilla-central git:(bookmarks/central) cargo update -p rayon  --precise 1.7.0
    Updating crates.io index
    Updating rayon v1.6.1 -> v1.7.0

Neat. It seems to apply without issues.

@larseggert I can create a mozilla-central Phabricator patch if you want. In case there aren't any hidden issues, the only real work will be auditing rayon v1.6.1 -> v1.7.0 code changes.

@larseggert
Copy link
Collaborator Author

@larseggert I can create a mozilla-central Phabricator patch if you want. In case there aren't any hidden issues, the only real work will be auditing rayon v1.6.1 -> v1.7.0 code changes.

That would be great, thanks. (I'm surprised mozilla-central doesn't have some sort of auto-upgrade of dependencies...)

@mxinden
Copy link
Collaborator

mxinden commented Nov 25, 2024

For the record, updating mozilla-central to a recent rayon version is happening through https://bugzilla.mozilla.org/show_bug.cgi?id=1933199.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Nov 28, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127
@mxinden
Copy link
Collaborator

mxinden commented Nov 29, 2024

For the record, updating mozilla-central to a recent rayon version is happening through https://bugzilla.mozilla.org/show_bug.cgi?id=1933199.

rayon v1.10.0 landed in mozilla-central last night.

@larseggert
Copy link
Collaborator Author

Thanks, saw it. Trying to make mtu importable.

@larseggert
Copy link
Collaborator Author

Next blocker:

Missing audit for windows-bindgen:0.58.0 (requires ['safe-to-deploy']).

i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Nov 30, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Dec 1, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127

UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Dec 1, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127

UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Dec 1, 2024
…in-reviewers,valentin

[mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate
to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends
on `rayon` `1.7`.

On the other hand mozilla-central depends on [`rayon`
`v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157).

Given that mozilla-central allows at most one version of each crate, let's
update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version.

See mozilla/neqo#2135 (comment) for details.

Differential Revision: https://phabricator.services.mozilla.com/D230127

UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants