-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Stop the PMTUD search at the interface MTU #2135
base: main
Are you sure you want to change the base?
Conversation
WIP Should we optimistically *start* the search at the interface MTU, and only start from 1280 when that fails?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2135 +/- ##
==========================================
+ Coverage 95.39% 95.40% +0.01%
==========================================
Files 113 113
Lines 36683 36701 +18
==========================================
+ Hits 34994 35015 +21
+ Misses 1689 1686 -3 ☔ View full report in Codecov by Sentry. |
Are there other projects using this optimistic approach? If I understand RFC 8899 correctly the local interface MTU is the end value, not the start value.
|
Failed Interop TestsNone ❓ All resultsSucceeded Interop TestsNone ❓ Unsupported Interop TestsNone ❓ |
All true, but in practice, the local interface is most often the limiting hop. |
Benchmark resultsPerformance differences relative to c6d5502. coalesce_acked_from_zero 1+1 entries: 💔 Performance has regressed.time: [105.57 ns 105.91 ns 106.29 ns] change: [+5.8389% +6.4848% +7.0193%] (p = 0.00 < 0.05) coalesce_acked_from_zero 3+1 entries: 💔 Performance has regressed.time: [121.44 ns 121.85 ns 122.31 ns] change: [+3.1785% +3.5907% +3.9881%] (p = 0.00 < 0.05) coalesce_acked_from_zero 10+1 entries: 💔 Performance has regressed.time: [121.10 ns 121.55 ns 122.10 ns] change: [+3.0928% +3.5527% +3.9751%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1000+1 entries: 💔 Performance has regressed.time: [100.56 ns 100.70 ns 100.87 ns] change: [+2.8990% +3.7252% +4.5719%] (p = 0.00 < 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [111.75 ms 111.81 ms 111.86 ms] change: [+0.2679% +0.3351% +0.4061%] (p = 0.00 < 0.05) SentPackets::take_ranges: No change in performance detected.time: [5.4807 µs 5.6706 µs 5.8770 µs] change: [-3.4151% -0.0737% +3.2172%] (p = 0.96 > 0.05) transfer/pacing-false/varying-seeds: 💔 Performance has regressed.time: [77.594 ms 77.799 ms 78.003 ms] change: [+183.46% +195.92% +209.17%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: 💔 Performance has regressed.time: [78.075 ms 78.250 ms 78.423 ms] change: [+117.36% +127.76% +139.21%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: 💔 Performance has regressed.time: [78.252 ms 78.392 ms 78.534 ms] change: [+193.17% +201.51% +210.20%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: 💔 Performance has regressed.time: [78.645 ms 78.787 ms 78.925 ms] change: [+81.903% +89.975% +98.836%] (p = 0.00 < 0.05) 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.time: [907.88 ms 917.99 ms 928.39 ms] thrpt: [107.71 MiB/s 108.93 MiB/s 110.15 MiB/s] change: time: [-0.1489% +1.4351% +3.0836%] (p = 0.08 > 0.05) thrpt: [-2.9914% -1.4148% +0.1491%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.time: [300.15 ms 302.04 ms 303.94 ms] thrpt: [32.901 Kelem/s 33.109 Kelem/s 33.316 Kelem/s] change: time: [-8.1101% -7.0497% -5.9887%] (p = 0.00 < 0.05) thrpt: [+6.3702% +7.5844% +8.8259%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.time: [34.366 ms 34.561 ms 34.777 ms] thrpt: [28.755 elem/s 28.934 elem/s 29.099 elem/s] change: time: [+1.3683% +2.3085% +3.2482%] (p = 0.00 < 0.05) thrpt: [-3.1460% -2.2564% -1.3498%] 1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.time: [1.6966 s 1.7138 s 1.7312 s] thrpt: [57.763 MiB/s 58.351 MiB/s 58.940 MiB/s] change: time: [+1.3262% +3.0626% +4.8632%] (p = 0.00 < 0.05) thrpt: [-4.6376% -2.9716% -1.3089%] Client/server transfer resultsTransfer of 33554432 bytes over loopback.
|
This PR exposed a bug in |
Let me make sure I understand the implications here correctly. Sorry for any potential mistakes. We only start probing once the connection is confirmed. neqo/neqo-transport/src/connection/mod.rs Lines 2794 to 2802 in 55e3a93
Say that a client's path MTU is smaller than their local interface MTU. Given that probing only starts once confirmed, i.e. after receiving Thus this optimization, and really all of PMTUD probing, assumes that the potential delay of one subsequent flight of HTTP requests by up to one PTO is worth the trade off of potentially increasing the overall connection throughput. Is that correct? |
This would need to change. What I think we should do is roughly this:
n should probably be something like 2, so we don't cause undue delay. |
In the case where a client's path MTU is smaller than their local interface MTU, this would add a delay of 2*PTO to every connection establishment, right? If so, isn't that a high cost for the potential increase in throughput? Or is such scenario just very rare? |
Yes. I think this is a rare case, but maybe we add some telemetry first to confirm? We could also cache a probed MTU towards a destination IP, like the OS does for a TCP MSS it has determined. |
How about we:
|
I was thinking we just log the log the local interface MTU together with the discovered PMTUD, and check for differences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good to me. Minor comments.
We need to bump the (See https://github.com/larseggert/neqo/actions/runs/11972576345/job/33379768209#step:8:86 for Firefox build failure.) @mxinden @valenting @KershawChang any tips on how to do this? |
We will need to update the following crates:
Note that, given For the record, |
Neat. It seems to apply without issues. @larseggert I can create a mozilla-central Phabricator patch if you want. In case there aren't any hidden issues, the only real work will be auditing |
That would be great, thanks. (I'm surprised mozilla-central doesn't have some sort of auto-upgrade of dependencies...) |
For the record, updating mozilla-central to a recent |
…in-reviewers,valentin [mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends on `rayon` `1.7`. On the other hand mozilla-central depends on [`rayon` `v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157). Given that mozilla-central allows at most one version of each crate, let's update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version. See mozilla/neqo#2135 (comment) for details. Differential Revision: https://phabricator.services.mozilla.com/D230127
|
Thanks, saw it. Trying to make |
|
…in-reviewers,valentin [mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends on `rayon` `1.7`. On the other hand mozilla-central depends on [`rayon` `v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157). Given that mozilla-central allows at most one version of each crate, let's update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version. See mozilla/neqo#2135 (comment) for details. Differential Revision: https://phabricator.services.mozilla.com/D230127
…in-reviewers,valentin [mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends on `rayon` `1.7`. On the other hand mozilla-central depends on [`rayon` `v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157). Given that mozilla-central allows at most one version of each crate, let's update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version. See mozilla/neqo#2135 (comment) for details. Differential Revision: https://phabricator.services.mozilla.com/D230127 UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
…in-reviewers,valentin [mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends on `rayon` `1.7`. On the other hand mozilla-central depends on [`rayon` `v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157). Given that mozilla-central allows at most one version of each crate, let's update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version. See mozilla/neqo#2135 (comment) for details. Differential Revision: https://phabricator.services.mozilla.com/D230127 UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
…in-reviewers,valentin [mozilla/neqo#2135](mozilla/neqo#2135) adds `mtu` crate to `neqo-*`. `mtu` crate depends on `windows-bindgen`. `windows-bindgen` depends on `rayon` `1.7`. On the other hand mozilla-central depends on [`rayon` `v1.6.1`](https://searchfox.org/mozilla-central/rev/7987501f2c2ed1914e5c682bd328ace9c4a7c6cd/Cargo.lock#5149-5157). Given that mozilla-central allows at most one version of each crate, let's update mozilla-central to `rayon` `1.10.0`, i.e. the most recent version. See mozilla/neqo#2135 (comment) for details. Differential Revision: https://phabricator.services.mozilla.com/D230127 UltraBlame original commit: a80b258672c95bf02014f72b7fde8609b6f507cc
Should we also optimistically start the search at the interface MTU, and only start from 1280 when that fails?