Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: don't allocate in UDP recv path #2076

Closed
wants to merge 1 commit into from

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Aug 27, 2024

Previously neqo-udp would have one long-lived receive buffer, but after reading into the buffer from the socket, it would allocate a new Vec for each UDP segment.

This commit does not allocate each UDP segment in a new Vec, but instead passes the single re-used receive buffer to
neqo_transport::Connection::process_input directly.

Part of #1693.


Draft for now. Want to see benchmark results before investing further.

Previously `neqo-udp` would have one long-lived receive buffer, but after
reading into the buffer from the socket, it would allocate a new `Vec` for each
UDP segment.

This commit does not allocate each UDP segment in a new `Vec`, but instead
passes the single re-used receive buffer to
`neqo_transport::Connection::process_input` directly.

Part of mozilla#1693.
Copy link

Benchmark results

Performance differences relative to 910a7cd.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [98.743 ns 99.096 ns 99.452 ns]
       change: [-1.4056% -0.8996% -0.3504%] (p = 0.00 < 0.05)

Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💚 Performance has improved.
       time:   [116.86 ns 117.16 ns 117.50 ns]
       change: [-2.6326% -1.6901% -1.0195%] (p = 0.00 < 0.05)

Found 19 outliers among 100 measurements (19.00%)
3 (3.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
11 (11.00%) high severe

coalesce_acked_from_zero 10+1 entries: Change within noise threshold.
       time:   [116.40 ns 116.80 ns 117.30 ns]
       change: [-2.1473% -1.4459% -0.7655%] (p = 0.00 < 0.05)

Found 20 outliers among 100 measurements (20.00%)
8 (8.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.168 ns 101.41 ns 110.93 ns]
       change: [-2.7203% +0.8236% +6.4501%] (p = 0.82 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [111.52 ms 111.66 ms 111.88 ms]
       change: [-0.2530% -0.1178% +0.0977%] (p = 0.20 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [25.985 ms 26.869 ms 27.751 ms]
       change: [-7.4673% -2.9356% +2.1097%] (p = 0.24 > 0.05)
transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [34.891 ms 36.585 ms 38.286 ms]
       change: [-4.9325% +1.5534% +8.5890%] (p = 0.64 > 0.05)
transfer/pacing-false/same-seed: No change in performance detected.
       time:   [31.107 ms 31.822 ms 32.519 ms]
       change: [-4.9157% -1.8751% +1.1009%] (p = 0.24 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [39.949 ms 42.962 ms 45.965 ms]
       change: [-12.157% -4.0154% +5.2027%] (p = 0.39 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.
       time:   [111.05 ms 111.36 ms 111.65 ms]
       thrpt:  [895.64 MiB/s 898.02 MiB/s 900.48 MiB/s]
change:
       time:   [-2.9276% -2.4889% -2.0409%] (p = 0.00 < 0.05)
       thrpt:  [+2.0835% +2.5525% +3.0159%]

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [310.54 ms 314.53 ms 318.58 ms]
       thrpt:  [31.390 Kelem/s 31.793 Kelem/s 32.202 Kelem/s]
change:
       time:   [-1.5604% +0.2356% +1.9649%] (p = 0.79 > 0.05)
       thrpt:  [-1.9270% -0.2350% +1.5852%]
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [40.418 ms 41.136 ms 41.853 ms]
       thrpt:  [23.893  elem/s 24.310  elem/s 24.741  elem/s]
change:
       time:   [-2.2850% +0.0149% +2.5328%] (p = 1.00 > 0.05)
       thrpt:  [-2.4703% -0.0149% +2.3385%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 157.0 ± 109.5 81.6 484.1 1.00
neqo msquic reno on 207.1 ± 11.3 194.7 234.7 1.00
neqo msquic reno 204.4 ± 11.2 193.3 223.2 1.00
neqo msquic cubic on 209.7 ± 11.4 199.0 230.8 1.00
neqo msquic cubic 222.4 ± 14.2 201.2 245.9 1.00
msquic neqo reno on 87.7 ± 21.0 74.0 174.7 1.00
msquic neqo reno 85.8 ± 21.8 73.7 166.7 1.00
msquic neqo cubic on 83.8 ± 13.7 73.6 127.1 1.00
msquic neqo cubic 84.9 ± 22.0 74.3 182.1 1.00
neqo neqo reno on 169.0 ± 83.0 122.3 403.1 1.00
neqo neqo reno 147.4 ± 59.5 118.6 387.2 1.00
neqo neqo cubic on 148.6 ± 25.0 120.9 220.7 1.00
neqo neqo cubic 176.2 ± 75.2 116.5 394.2 1.00

⬇️ Download logs

Copy link

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change the code so that input datagrams are taken with a reference to the underlying buffer instead?

You might need to split Datagram in two parts to do that, one with an owned buffer and one with a borrowed one (maybe; for output). Ideally though we never have to own the buffer.

This change seems like it would be harder to undo that doing this right from the outset.

@mxinden
Copy link
Collaborator Author

mxinden commented Sep 8, 2024

Thank you for taking a look @martinthomson.

Is it possible to change the code so that input datagrams are taken with a reference to the underlying buffer instead?

Yes. I created #2093 implementing your suggestion above. It is a Draft for now, but I would argue goes beyond a proof-of-concept.

Closing here in favor of #2093.

@mxinden mxinden closed this Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants