Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: throughput test (TCP, QUIC, libp2p, but not iperf) never exits slow start #261

Closed
Tracked by #63
marten-seemann opened this issue Aug 16, 2023 · 6 comments · Fixed by #276
Closed
Tracked by #63

Comments

@marten-seemann
Copy link
Contributor

Moving a conversation from Slack here: https://filecoinproject.slack.com/archives/C03K82MU486/p1692012437887379

I spent some time playing around with the perf setup. The fact that iPerf is so much faster than everything else really bothered me. iPerf is a sanity check for our setup, and if the value measured for iPerf differs significantly from the HTTPS value we measure, something is wrong in the test setup.

It turns out that what we’re seeing here is not a performance measurement at all, but purely Reno / Cubic slow start. Reno starts with a congestion window of 10 packets, and increases the cwnd by 1 packet for every ACK received, effectively leading to a doubling of the cwnd every RTT. Slow start is only exited once the first packet loss (or ECN marking) occurs.
I created a spreadsheet with a simple back-of-the-envelope calculation: https://docs.google.com/spreadsheets/d/1LYVOh3kkefhD5t-rR0JEgPJ-YqKpqaXmUCYu_Z-t0xg/edit?usp=sharing

To send 100 MB (the amount of data transferred in our tests) in slow start, it takes between 12 and 13 RTTs. At an RTT of 61ms, this corresponds to an average “bandwidth” of somewhere around 1 Gbit/s, which pretty much matches our measurement result.


What do we actually want to show in this benchmark?

This might be controversial, but I'd argue that we don't want to measure:

  • Specifics of the congestion controller in use (unless it's completely and horribly broken)
  • How well the underlying stack makes use of hardware offloads (QUIC crypto offload for example)

What we actually want to show is that using libp2p doesn't impose a (significant) performance penalty when compared to vanilla HTTP(S), on typical machines.

This suggests that using a machine that has a 25 Gbit/s link speed might not be the most useful setup. Achieving even 5 Gbit/s throughput on a single connection (usually handled by a single CPU core) is challenging when using QUIC, unless various hardware offload are taken advantage of.

Using a high bandwidth also means that flow control windows need to large enough to accommodate a large BDP, which requires a high memory commitment, which a p2p node might not be willing to do due to DoS concerns.

Measuring Throughput

We currently calculate throughput as data transferred / time taken. This is correct in the t -> inf limit, but less useful when slow start takes a dozen roundtrips and transfers O(500 MB) data. To average out the slow start period, we'd need to transfer an order of magnitude more data at least. This comes with two problems:

  • Some implementations are currently very slow, for these implementations, transferring 10 GB of data would take a very long time.
  • We're hardcoding assumptions about slow start and about the bandwidth of the connection into the amount of data we're requesting.

A better solution would be to have the client request an infinite amount of data, and calculate the current bandwidth of the transfer every couple of seconds (i.e. only taking into account the amount of data transferred in the last interval). It would then be immediately obvious when these values have converged and we can stop the measurement. As a simplification, we can also just run the test for a fixed time (30s?).

@marten-seemann marten-seemann changed the title throughput test (TCP, QUIC, libp2p, but not iperf) never exits slow start perf: throughput test (TCP, QUIC, libp2p, but not iperf) never exits slow start Aug 16, 2023
@dhuseby
Copy link

dhuseby commented Aug 23, 2023

@marten-seemann, @sukunrt, @mxinden , and @thomaseizinger I'm assuming the goal here is to test the libp2p implementations the same way that iPerf tests. Meaning we run for 60 seconds and sample every 1 second so that we can exit the slow start and get convergence on the throughput. Is that correct?

I image that would require instrumenting the libp2p implementations to expose measurements of the total bytes sent/received at the transport level as close to the socket interface as we can get. Is that correct? If so, which implementations have that? I think Rust's is aggregate across all transports, not on a per-transport basis. Does Go have the ability to measure on a per-transport basis? I talked with @sukunrt about lending a hand on the Go implementation after his autonat work wraps up.

I assume JS implementation needs to have similar instrumentation added as well. @achingbrain what's the story with js-libp2p and being able to measure perf on a per-transport basis?

@thomaseizinger
Copy link
Contributor

I image that would require instrumenting the libp2p implementations to expose measurements of the total bytes sent/received at the transport level as close to the socket interface as we can get. Is that correct?

If you want to measure as close to the socket as possible, you can measure with common linux tools, how many bytes go across a certain port but I don't think that is what we need.

I think what is interesting for our users is the throughput they can achieve for the application data, i.e. what is being written to a stream. For that, we don't need deep instrumentation of the various libp2p stacks but just measure, how fast we can write bytes to a stream. @mxinden is currently refactoring the rust implementation to do just that :)

@marten-seemann
Copy link
Contributor Author

@marten-seemann, @sukunrt, @mxinden , and @thomaseizinger I'm assuming the goal here is to test the libp2p implementations the same way that iPerf tests. Meaning we run for 60 seconds and sample every 1 second so that we can exit the slow start and get convergence on the throughput. Is that correct?

Correct.

I image that would require instrumenting the libp2p implementations to expose measurements of the total bytes sent/received at the transport level as close to the socket interface as we can get. Is that correct?

I agree with @thomaseizinger, it should be possible (and it's preferable!) to measure this at the application layer.

For that, we don't need deep instrumentation of the various libp2p stacks but just measure, how fast we can write bytes to a stream. @mxinden is currently refactoring the rust implementation to do just that :)

This should be trivial to implement on top of any stream implementation. I'm surprised this would require any refactoring. Here's a super easy way to do this (in Go-style pseudo-code):

var buffer []byte // an array of a few kb
var bytesSent int
t := time.Now()

for {
     stream.Write(buffer)
     bytesSent += len(buffer)
     if time.Since(t) > time.Second {
           // print bandwidth (bytes sent / time.Since(t))
           t = time.Now()
           bytesSent = 0
     }
}

We definitely won't need to make any changes to go-libp2p, and the actual implementation will probably look very similar to this pseudo-code (modulo some error handling).

@mxinden
Copy link
Member

mxinden commented Aug 24, 2023

Edit: Marten beat me to it. My original response basically the same as Marten's above:

I'm assuming the goal here is to test the libp2p implementations the same way that iPerf tests. Meaning we run for 60 seconds and sample every 1 second so that we can exit the slow start and get convergence on the throughput. Is that correct?

Correct. Though instead of testing at the connection level (iperf), we will test at the stream level.

I image that would require instrumenting the libp2p implementations to expose measurements of the total bytes sent/received at the transport level as close to the socket interface as we can get.

No. We will be measuring the throughput at the perf protocol implementation level. In other words we will measure how many bytes the perf protocol implementation sends and receives on a stream.

If so, which implementations have that?

No changes to the core of an implementations needed.

I assume JS implementation needs to have similar instrumentation added as well. @achingbrain what's the story with js-libp2p and being able to measure perf on a per-transport basis?

Again, this is not needed for this issue.

A proof of concept is implemented in libp2p/rust-libp2p#4382. Once this is hooked up into https://github.com/libp2p/test-plans/tree/master/perf we can discuss the next steps for other implementations.

@marten-seemann
Copy link
Contributor Author

It might make sense to implement this on the receiver, not on the sender side, though. There's more in terms of buffers that can interfere with the measurement on the sender side, usually the send path has very shallow buffers if the application is actually reading from the socket.

mxinden added a commit that referenced this issue Aug 24, 2023
Our current throughput tests open a connection, open a stream,
up- or download 100MB and close the connection. 100 MB is not enough on the
given path (60ms, ~5gbit/s) to exit congestion controller's slow-start. See
#261 for details.

Instead of downloading 100MB multiple times, each on a new connection, establish
a single connection and continuously measure the throughput for a fixed
duration (60s).
mxinden added a commit to mxinden/perf that referenced this issue Aug 28, 2023
@mxinden
Copy link
Member

mxinden commented Sep 25, 2023

It might make sense to implement this on the receiver, not on the sender side, though. There's more in terms of buffers that can interfere with the measurement on the sender side, usually the send path has very shallow buffers if the application is actually reading from the socket.

@marten-seemann #276 always measures on the client side. That said, it both measures the upload and download bandwidth. Thus we cover the solution you propose above. One can see a higher spread in the upload measurements. I assume this is due to wrong measurements where one assume data to be sent which is actually sitting in send buffers.

newplot(11)

newplot(12)

https://observablehq.com/d/682dcea9fe2505c4?branch=27d07a6f47c2bc1a9c9d9a9f6626b536248284f5

marten-seemann pushed a commit to quic-go/perf that referenced this issue Oct 19, 2023
* feat(perf): support iperf-style intermittent results

Required for libp2p/test-plans#261.

* Print on read

* go fmt

* Apply code review comments

* go fmt
mxinden added a commit that referenced this issue Oct 25, 2023
Our current throughput tests open a connection, open a stream,
up- or download 100MB and close the connection. 100 MB is not enough on the
given path (60ms, ~5gbit/s) to exit congestion controller's slow-start. See
#261 for details.

Instead of downloading 100MB multiple times, each on a new connection, establish
a single connection and continuously measure the throughput for a fixed
duration (20s).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants