perf(bin): add criterion benchmarks #1758

mxinden · 2024-03-18T19:27:12Z

Wraps the neqo-client and neqo-server code, starts the server and runs various benchmarks through the client.

Benchmarks:

single-request-1gb
single-request-1mb
requests-per-second
handshakes-per-second

It would be great to have end-to-end benchmarks (in CI) to land performance optimizations like #1741 with confidence.

This pull request is a shot at the above using criterion.

Benefits:

Little boilerplate, written in Rust.
Easy to extend with various benchmarks (e.g. multiple connections, handshakes per second, ...)
Leverage features of criterion (though arguably the wrong tool for the job)
Hook easily into existing bench.yml workflow including its baseline comparison

Obviously there are a million other ways to do this.

Using hyperfine like it is done in ci: Compare performance to msquic #1750, thus agnostic to the binary being tested
Building on top of upload_test.sh simulating an actual network (latency, packet drop, ...), i.e. not just loopback.
Build our own platform, like
- https://microsoft.github.io/msquic/
- https://observablehq.com/@libp2p-workspace/performance-dashboard
...

What are people's thoughts? Are there long-term plans for neqo? Are there prominent examples worth taking inspiration from? Does Firefox / Mozilla already have some continuous benchmarking setup worth hooking into?

larseggert · 2024-03-18T23:50:30Z

Better performance (regression) testing is definitely something we are interested in having.

One part of this is the building-up of a suite of benchmarks that we run in CI. I envision a process where if we identify a performance issue, we first quantify it through some (micro)benchmarks that from then on become part of the suite. If, based on the benchmark results, we decide that a performance issue needs to be fixed, we do that.

A second part is macro-level benchmarks, e.g., end-to-end throughput and other tests. I haven't spend much time on how we should best do this. At the moment, I feel that some simple end-to-end tests, like you propose here or we have in bench.yml are already sufficient to identify performance issues that we can address using the process in the previous paragraph.

mxinden · 2024-03-20T14:30:50Z

Making some progress. Actual numbers are still preliminary.

Benchmarking single-request-1gb/client: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 76.6s.
single-request-1gb/client
                        time:   [8.3150 s 9.7354 s 11.247 s]
                        thrpt:  [91.047 MiB/s 105.18 MiB/s 123.15 MiB/s]
                 change:
                        time:   [-29.289% -14.840% +0.7625%] (p = 0.10 > 0.05)
                        thrpt:  [-0.7568% +17.426% +41.421%]
                        No change in performance detected.

Benchmarking single-request-1mb/client: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.8s or enable flat sampling.
single-request-1mb/client
                        time:   [159.50 ms 162.72 ms 167.38 ms]
                        thrpt:  [5.9743 MiB/s 6.1454 MiB/s 6.2697 MiB/s]
                 change:
                        time:   [+30.929% +53.168% +81.193%] (p = 0.00 < 0.05)
                        thrpt:  [-44.810% -34.712% -23.623%]
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low severe

requests-per-second(10_000)/client
                        time:   [217.67 ms 242.75 ms 269.26 ms]
                        thrpt:  [37.140 Kelem/s 41.194 Kelem/s 45.942 Kelem/s]
                 change:
                        time:   [-13.591% -0.6218% +13.808%] (p = 0.92 > 0.05)
                        thrpt:  [-12.133% +0.6256% +15.729%]
                        No change in performance detected.

Benchmarking handshakes-per-second(100)/client: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 25.0s.
handshakes-per-second(100)/client
                        time:   [1.8661 s 1.9559 s 2.0763 s]
                        thrpt:  [48.162  elem/s 51.127  elem/s 53.587  elem/s]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Wraps the `neqo-client` and `neqo-server` code, starts the server and runs various benchmarks through the client. Benchmarks: - single-request-1gb - single-request-1mb - requests-per-second - handshakes-per-second

github-actions · 2024-03-23T16:44:40Z

Benchmark results

Performance differences relative to efc4813.

drain a timer quickly time: [390.60 ns 397.57 ns 404.04 ns]
change: [-1.9385% -0.4140% +1.1917%] (p = 0.61 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 1+1 entries
time: [197.26 ns 197.75 ns 198.27 ns]
change: [-0.3258% +0.0111% +0.3359%] (p = 0.95 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 3+1 entries
time: [236.87 ns 237.51 ns 238.20 ns]
change: [-0.0436% +0.3555% +0.7441%] (p = 0.07 > 0.05)
No change in performance detected.
coalesce_acked_from_zero 10+1 entries
time: [235.38 ns 236.34 ns 237.55 ns]
change: [-1.2305% -0.6099% -0.0753%] (p = 0.03 < 0.05)
Change within noise threshold.
coalesce_acked_from_zero 1000+1 entries
time: [213.98 ns 214.23 ns 214.51 ns]
change: [-1.6919% -0.9648% -0.1586%] (p = 0.01 < 0.05)
Change within noise threshold.
RxStreamOrderer::inbound_frame()
time: [119.54 ms 119.62 ms 119.71 ms]
change: [-0.9084% -0.8153% -0.7256%] (p = 0.00 < 0.05)
Change within noise threshold.
transfer/Run multiple transfers with varying seeds
time: [118.04 ms 118.29 ms 118.53 ms]
thrpt: [33.747 MiB/s 33.816 MiB/s 33.887 MiB/s]
change:
time: [-1.2476% -0.9168% -0.5991%] (p = 0.00 < 0.05)
thrpt: [+0.6027% +0.9253% +1.2634%]
Change within noise threshold.
transfer/Run multiple transfers with the same seed
time: [118.99 ms 119.16 ms 119.34 ms]
thrpt: [33.517 MiB/s 33.567 MiB/s 33.617 MiB/s]
change:
time: [-0.6255% -0.4197% -0.2152%] (p = 0.00 < 0.05)
thrpt: [+0.2157% +0.4215% +0.6294%]
Change within noise threshold.
1-conn/1-100mb-resp (aka. Download)/client
time: [1.1177 s 1.1346 s 1.1565 s]
thrpt: [86.466 MiB/s 88.139 MiB/s 89.469 MiB/s]
1-conn/10_000-1b-seq-resp (aka. RPS)/client
time: [431.40 ms 433.45 ms 435.49 ms]
thrpt: [22.963 Kelem/s 23.071 Kelem/s 23.180 Kelem/s]
100-seq-conn/1-1b-resp (aka. HPS)/client
time: [3.3576 s 3.3600 s 3.3626 s]
thrpt: [29.739 elem/s 29.762 elem/s 29.784 elem/s]

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			799.1 ± 259.1	491.5	1201.7	1.00
neqo	msquic	reno	on	2045.5 ± 170.7	1914.0	2463.2	1.00
neqo	msquic	reno		2085.7 ± 258.0	1900.0	2462.9	1.00
neqo	msquic	cubic	on	2056.1 ± 251.4	1762.2	2490.3	1.00
neqo	msquic	cubic		2080.4 ± 257.9	1884.8	2453.0	1.00
msquic	neqo	reno	on	4525.2 ± 248.3	4263.5	5123.2	1.00
msquic	neqo	reno		4363.4 ± 229.5	4093.9	4874.4	1.00
msquic	neqo	cubic	on	4562.0 ± 292.7	4222.7	5148.2	1.00
msquic	neqo	cubic		4401.6 ± 225.6	4164.1	4834.0	1.00
neqo	neqo	reno	on	3940.2 ± 349.8	3521.2	4634.8	1.00
neqo	neqo	reno		4162.0 ± 247.8	3718.0	4598.2	1.00
neqo	neqo	cubic	on	4409.7 ± 219.7	4060.8	4789.8	1.00
neqo	neqo	cubic		4182.1 ± 179.3	3916.7	4492.2	1.00

⬇️ Download logs

It just takes too long on the bench machine

neqo-bin/benches/main.rs

mxinden

This pull request is ready for a first review.

neqo-bin/benches/main.rs

larseggert · 2024-03-25T15:41:51Z

Is there a way to group the benchmarks, so we can pin those where it makes sense to a core, and let others such as this one be scheduled on multiple cores?

This reverts commit a0ef46a.

codecov · 2024-03-27T08:55:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.05%. Comparing base (47dfb3b) to head (6eaea5c).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1758      +/-   ##
==========================================
- Coverage   93.07%   93.05%   -0.02%     
==========================================
  Files         117      117              
  Lines       36449    36374      -75     
==========================================
- Hits        33926    33849      -77     
- Misses       2523     2525       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

.github/workflows/bench.yml

mxinden · 2024-03-27T14:27:13Z

@larseggert mind taking another look?

larseggert

Could you break out the renaming of ClientError and ServerError to Error into its own PR? Are there any other chores that should not be part of this one?

mxinden · 2024-03-27T15:12:38Z

Could you break out the renaming of ClientError and ServerError to Error into its own PR?

This pull request moves src/bin/server/main.rs to src/server/mod.rs in order for the benchmarks to be able to call fn server(). Thus it moves the ServerError from the main module into the server module. ServerError being in the server module makes clippy complain that the name is duplicated. Thus I renamed ServerError to Error and ClientError to Error.

Long story short, these renames are not unrelated. Do you still want me to separate them out into a separate pull request @larseggert? If so, shall I include the file moving in the separate pull request as well?

Are there any other chores that should not be part of this one?

In general, very much in favor of atomic tightly scoped pull requests. That said, in my eyes all changes are related to the pull request.

neqo-bin/src/udp.rs

mxinden · 2024-03-27T15:26:00Z

Latest benchmark run results are now within the same ballpark as when run on my laptop.

1-conn/1-100mb-resp (aka. Download)/client
time: [1.1177 s 1.1346 s 1.1565 s]
thrpt: [86.466 MiB/s 88.139 MiB/s 89.469 MiB/s]

1-conn/10_000-1b-seq-resp (aka. RPS)/client
time: [431.40 ms 433.45 ms 435.49 ms]
thrpt: [22.963 Kelem/s 23.071 Kelem/s 23.180 Kelem/s]

100-seq-conn/1-1b-resp (aka. HPS)/client
time: [3.3576 s 3.3600 s 3.3626 s]
thrpt: [29.739 elem/s 29.762 elem/s 29.784 elem/s]

#1758 (comment)

mxinden force-pushed the bin-bench branch from e85a76c to 0a24ccc Compare March 20, 2024 14:24

mxinden changed the title ~~feat(bin): add single conn 1GB download benchmark~~ perf(bin): add criterion benchmarks Mar 20, 2024

mxinden force-pushed the bin-bench branch 5 times, most recently from 29bed4c to b532192 Compare March 21, 2024 10:45

perf(bin): add criterion benchmarks

56a6fd8

Wraps the `neqo-client` and `neqo-server` code, starts the server and runs various benchmarks through the client. Benchmarks: - single-request-1gb - single-request-1mb - requests-per-second - handshakes-per-second

mxinden force-pushed the bin-bench branch from b532192 to 56a6fd8 Compare March 21, 2024 10:56

mxinden added 5 commits March 23, 2024 12:06

Merge branch 'main' of https://github.com/mozilla/neqo into bin-bench

0a736cf

Rename benchmark instances

2be3b1d

Turn off logging

1d59921

Remove 100mb sample size restriction

af06375

Use v6

0fe6078

mxinden added 3 commits March 24, 2024 09:32

Remove 1gb

d450df0

It just takes too long on the bench machine

Use /dev/null

d4fcce1

rework imports

0c359aa

mxinden commented Mar 24, 2024

View reviewed changes

neqo-bin/benches/main.rs Show resolved Hide resolved

Fix import

44cb987

mxinden marked this pull request as ready for review March 24, 2024 10:30

mxinden requested review from KershawChang, martinthomson and larseggert as code owners March 24, 2024 10:30

mxinden commented Mar 24, 2024

View reviewed changes

neqo-bin/benches/main.rs Show resolved Hide resolved

mxinden mentioned this pull request Mar 24, 2024

Explore multi-packet I/O (aka. fast UDP I/O) #1693

Open

7 tasks

Merge branch 'main' into bin-bench

0276ff1

mxinden mentioned this pull request Mar 25, 2024

perf(bin/server): increase msg size and don't allocate msg per resp #1772

Merged

mxinden added 8 commits March 26, 2024 14:27

Test without CPU pinning

a0ef46a

Merge branch 'main' of https://github.com/mozilla/neqo into bin-bench

ed1af32

Revert "Test without CPU pinning"

e988287

This reverts commit a0ef46a.

Pin all but neqo-bin to CPU 0

6648d15

Quote package

4981f03

Add rational for neqo-bin handling

cbf3a5e

Rework tuples

3541d17

Merge branch 'main' of https://github.com/mozilla/neqo into bin-bench

e3141f8

larseggert reviewed Mar 27, 2024

View reviewed changes

.github/workflows/bench.yml Outdated Show resolved Hide resolved

mxinden added 3 commits March 27, 2024 13:02

Pin first

8678188

Just duplicate the two calls

6eaea5c

Add --workspace flag

c637266

Remove taskset from neqo-bin

27ae7cb

larseggert requested changes Mar 27, 2024

View reviewed changes

mxinden commented Mar 27, 2024

View reviewed changes

neqo-bin/src/udp.rs Show resolved Hide resolved

larseggert approved these changes Mar 27, 2024

View reviewed changes

larseggert added this pull request to the merge queue Mar 27, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 27, 2024

larseggert added this pull request to the merge queue Mar 27, 2024

Merged via the queue into mozilla:main with commit 36fae62 Mar 27, 2024
14 checks passed

mxinden mentioned this pull request Apr 7, 2024

fix(bin): benchmark HPS without zero RTT #1795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(bin): add criterion benchmarks #1758

perf(bin): add criterion benchmarks #1758

mxinden commented Mar 18, 2024 •

edited

Loading

larseggert commented Mar 18, 2024

mxinden commented Mar 20, 2024

github-actions bot commented Mar 23, 2024 •

edited

Loading

mxinden left a comment

larseggert commented Mar 25, 2024

codecov bot commented Mar 27, 2024 •

edited

Loading

mxinden commented Mar 27, 2024

larseggert left a comment

mxinden commented Mar 27, 2024

mxinden commented Mar 27, 2024

perf(bin): add criterion benchmarks #1758

perf(bin): add criterion benchmarks #1758

Conversation

mxinden commented Mar 18, 2024 • edited Loading

larseggert commented Mar 18, 2024

mxinden commented Mar 20, 2024

github-actions bot commented Mar 23, 2024 • edited Loading

Benchmark results

Client/server transfer results

mxinden left a comment

Choose a reason for hiding this comment

larseggert commented Mar 25, 2024

codecov bot commented Mar 27, 2024 • edited Loading

Codecov Report

mxinden commented Mar 27, 2024

larseggert left a comment

Choose a reason for hiding this comment

mxinden commented Mar 27, 2024

mxinden commented Mar 27, 2024

mxinden commented Mar 18, 2024 •

edited

Loading

github-actions bot commented Mar 23, 2024 •

edited

Loading

codecov bot commented Mar 27, 2024 •

edited

Loading