Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(bin/server): increase msg size and don't allocate msg per resp #1772

Merged
merged 2 commits into from
Mar 27, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Mar 25, 2024

Previously neqo-server would respond to a request by repeatedly sending a static 440 byte message (Major-General's Song). Instead of sending 440 bytes, increase the batch size to 4096 bytes. This also matches the neqo-client receive buffer size.

let mut data = vec![0; 4096];

Previously ResponseData::repeat would convert the provided buf: &[u8] to Vec<u8>, i.e. re-allocate the buf. Instead keep a reference to the original buf, thus removing the allocation.


Preliminary benchmark comparison:

main:

1-conn/1-100mb-resp (aka. Download)/client
                        time:   [768.10 ms 1.0894 s 1.5517 s]
                        thrpt:  [64.443 MiB/s 91.794 MiB/s 130.19 MiB/s]

this pull request:

1-conn/1-100mb-resp (aka. Download)/client
                        time:   [674.30 ms 837.63 ms 1.0564 s]
                        thrpt:  [94.661 MiB/s 119.38 MiB/s 148.30 MiB/s]

Also visible in qviz where client reads MANY small (440 bytes) chunks from server:

image

I suggest merging #1758 first. We can then validate this pull request using the CI benchmark server.

I will take a look whether a long lived receive buffer on the client side has any impact next.

let mut data = vec![0; 4096];

Previously `neqo-server` would respond to a request by repeatedly sending a
static 440 byte message (Major-General's Song). Instead of sending 440 bytes,
increase the batch size to 4096 bytes. This also matches the `neqo-client`
receive buffer size.

https://github.com/mozilla/neqo/blob/76630a5ebb6c6b94de6a40cf3f439b9a846f6ab7/neqo-bin/src/bin/client/http3.rs#L165

Previously `ResponseData::repeat` would convert the provided `buf: &[u8]` to `
Vec<u8>`, i.e. re-allocate the buf. Instead keep a reference to the original
buf, thus removing the allocation.
Copy link
Collaborator

@larseggert larseggert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix clippy, but otherwise LGTM.

auto-merge was automatically disabled March 26, 2024 13:58

Head branch was pushed to by a user without write access

Copy link

Benchmark results

Performance differences relative to 76630a5.

  • drain a timer quickly time: [391.24 ns 399.16 ns 406.56 ns]
    change: [-1.6765% +0.0373% +1.7660%] (p = 0.97 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 1+1 entries
    time: [194.34 ns 194.80 ns 195.30 ns]
    change: [-0.2246% +0.3216% +1.0490%] (p = 0.39 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 3+1 entries
    time: [234.86 ns 235.47 ns 236.11 ns]
    change: [-1.3377% -0.5925% +0.5348%] (p = 0.24 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 10+1 entries
    time: [234.13 ns 234.80 ns 235.60 ns]
    change: [-1.4329% -0.8859% -0.3309%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • coalesce_acked_from_zero 1000+1 entries
    time: [213.97 ns 214.12 ns 214.28 ns]
    change: [-0.1185% +0.7805% +1.7211%] (p = 0.08 > 0.05)
    No change in performance detected.

  • RxStreamOrderer::inbound_frame()
    time: [117.61 ms 117.70 ms 117.79 ms]
    change: [+0.1268% +0.2385% +0.3501%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • transfer/Run multiple transfers with varying seeds
    time: [117.35 ms 117.59 ms 117.83 ms]
    thrpt: [33.947 MiB/s 34.016 MiB/s 34.085 MiB/s]
    change:
    time: [-0.4683% -0.1775% +0.1387%] (p = 0.25 > 0.05)
    thrpt: [-0.1385% +0.1778% +0.4705%]
    No change in performance detected.

  • transfer/Run multiple transfers with the same seed
    time: [118.27 ms 118.42 ms 118.57 ms]
    thrpt: [33.736 MiB/s 33.778 MiB/s 33.820 MiB/s]
    change:
    time: [-0.2297% -0.0505% +0.1355%] (p = 0.59 > 0.05)
    thrpt: [-0.1353% +0.0505% +0.2303%]
    No change in performance detected.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 816.5 ± 345.1 372.3 1326.9 1.00
neqo msquic reno on 1944.3 ± 30.1 1905.8 1997.6 1.00
neqo msquic reno 2110.8 ± 206.9 1909.8 2471.1 1.00
neqo msquic cubic on 2010.2 ± 273.3 1753.1 2720.3 1.00
neqo msquic cubic 1973.4 ± 198.9 1812.2 2509.8 1.00
msquic neqo reno on 4490.0 ± 306.8 4171.6 5159.4 1.00
msquic neqo reno 4444.6 ± 192.6 4164.6 4844.3 1.00
msquic neqo cubic on 4571.1 ± 266.9 4272.5 5179.5 1.00
msquic neqo cubic 4431.7 ± 242.7 4142.3 4947.5 1.00
neqo neqo reno on 3586.7 ± 237.4 3284.1 3950.7 1.00
neqo neqo reno 3630.4 ± 255.0 3102.2 4007.3 1.00
neqo neqo cubic on 4428.5 ± 341.1 3779.6 4992.5 1.00
neqo neqo cubic 4365.0 ± 293.2 4071.9 4994.8 1.00

⬇️ Download logs

@larseggert larseggert added this pull request to the merge queue Mar 27, 2024
Merged via the queue into mozilla:main with commit 6a51a35 Mar 27, 2024
14 checks passed
mxinden added a commit to mxinden/neqo that referenced this pull request May 4, 2024
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
KershawChang pushed a commit to KershawChang/neqo that referenced this pull request May 7, 2024
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
github-merge-queue bot pushed a commit that referenced this pull request May 8, 2024
* refactor(bin): introduce server/http3.rs and server/http09.rs

The QUIC Interop Runner requires an http3 and http09 implementation for both
client and server. The client code is already structured into an http3 and an
http09 implementation since #1727.

This commit does the same for the server side, i.e. splits the http3 and http09
implementation into separate Rust modules.

* refactor: merge mozilla-central http3 server into neqo-bin

There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- #1564
- #1569
- #1578
- #1581
- #1604
- #1612
- #1676
- #1692
- #1707
- #1708
- #1727
- #1753
- #1756
- #1766
- #1772
- #1786
- #1787
- #1788
- #1794
- #1806
- #1808
- #1848
- #1866

At this point, bugs in (2) are hard to fix, see e.g.
#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).

* Move firefox.rs to mozilla-central

* Reduce HttpServer trait functions

* Extract constructor

* Remove unused deps

* Remove clap color feature

Nice to have. Adds multiple dependencies. Hard to justify for mozilla-central.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants