Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce perf spec #478

Merged
merged 7 commits into from
May 31, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions perf/perf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Perf <!-- omit in toc -->

| Lifecycle Stage | Maturity | Status | Latest Revision |
| --------------- | ------------- | ------ | --------------- |
| 1A | Working Draft | Active | r0, 2022-11-16 |

Authors: [@marcopolo]

Interest Group: [@marcopolo], [@mxinden], [@marten-seemann]

[@marcopolo]: https://github.com/mxinden
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
[@mxinden]: https://github.com/mxinden
[@marten-seemann]: https://github.com/marten-seemann

# Table of Contents <!-- omit in toc -->
- [Context](#context)
- [Protocol](#protocol)
- [Benchmarks](#benchmarks)
- [Single connection throughput](#single-connection-throughput)
- [Handshakes per second](#handshakes-per-second)
- [Security Considerations](#security-considerations)
- [Prior Art](#prior-art)

# Context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Context
## Context

Nit: Multiple L1 headings in this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that a problem? This is a top level section. If we made this ## then it would be a child of ToC, which is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the argument now. This could all be a child of the # Perf part. I think that's a good argument, but the counter argument is that then you would always lose one nesting level in this style since you'll always have that title heading.

Unless we have some specific templating system we are trying to make happy, I don't think it's necessary to only have a single # level. This whole doc is about perf, so it should be okay to have multiple # levels.


The `perf` protocol represents a standard benchmarking protocol that we can use
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
to talk about performance within and across libp2p implementations. This lets us
analyze peformance, guide improvements, and protect against regressions.

# Protocol

The `/perf/1.0.0` protocol (from here on referred to as simply `perf`) is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wanted to stop the use of numbers in the protocols in their first version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documented somewhere? I don't particularly care one way or the other, but we already have a couple implementations out and this would be a bit annoying of a change with little to gain so I would opt for not making the change.

client driven set of benchmarks. To not reinvent the wheel, this perf protocol
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
is almost identical to Nick Bank's [_QUIC Performance_
Internet-Draft](https://datatracker.ietf.org/doc/html/draft-banks-quic-performance#section-2.3)
but adapted to libp2p.


The protocol is as a follows:

Client:

1. Open a libp2p stream to the server.
2. Tell the server how many bytes we want the server to send us as a single
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
big-endian uint64 number. Zero is a valid number, so is the max uint64 value.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
3. Write some amount of data to the stream.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
4. Close the write side of our stream.
5. Read from the read side of the stream. This
should be the same number of bytes as we told the server in step 2.

Server, on handling a new `perf` stream:
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
1. Read the big-endian uint64 number. This is how many bytes we'll send back.
2. Read from the stream until we get an `EOF` (client's write side was closed).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
3. Send the number of bytes defined in step 1 back to the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a benefit of sending things back at the same time at least as an option rather than waiting til the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would opening two streams in parallel be enough to satisfy this use case?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe full-duplex throughput within a single stream is interesting as well? This could be an optional flag in the client's request.

The referenced draft does ask the question whether it should be possible to send the traffic from the server before FIN:

Note - Should the server wait for FIN before replying?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get most of the value with using two streams. The full-duplex is interesting, but I'd hold off from doing that until we have a compelling use case since it adds a bit of complexity to support that option.

4. Close the stream.

# Benchmarks

The above protocol is flexible enough to run the following benchmarks and more.
The exact specifics of the benchmark (e.g. how much data to download or for how
long) are left up to the benchmark implementation. Consider these rough
guidelines for how to run one of these benchmarks.

Other benchmarks can be run with the same protocol. The following benchmarks
have immediate usefulness, but other benchmarks can be added as we find them
useful. Consult the [_QUIC Performance_
Internet-Draft](https://datatracker.ietf.org/doc/html/draft-banks-quic-performance#section-2.3)
for some other benchmarks (called _scenarios_ in the document).

## Single connection throughput

For an upload test, the client sets the the server response size 0 bytes, writes
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
some amount of data and closes the stream.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

For a download test, the client sets the server response size to N bytes, and
closes the write side of the data.

The measurements are gathered and reported by the client by measuring how many
bytes were transferred by the total time it took from stream open to stream
close.

A timer based variant is also possible where we see how much data a client can
upload or download within a specific time. For upload it's the same as before
and the client closes the stream after the timer ends. For download, the client
should request a response size of max uint64, then close the stream after the
timer ends.

## Handshakes per second

This benchmark measures connection setup efficiency. A transport that takes many
RTTs will perform worse here than one that takes fewer.

To run this benchmark:
1. Set up N clients
2. Each client opens K connections/s to a single server
3. once a connection is established, the client closes it and establishes
another one.

handshakes per second are calculated by taking the total number of connections
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
successfully established and divide it by the time period of the test.

# Security Considerations

Since this protocol lets clients ask servers to do significant work, it
SHOULD NOT be enabled by default in any implementation. Users are advised not to
enable this on publicly reachable nodes.

Authentacting by Peer ID could mitigate the security concern by only allowing
trusted clients to use the protocol. Support for this is left to the implementation.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

# Prior Art

As mentioned above, this document is inspired by Nick Bank's: [QUIC Performance Internet-Draft](https://datatracker.ietf.org/doc/html/draft-banks-quic-performance)

[iperf](https://iperf.fr)

[@mxinden's libp2p perf](https://github.com/mxinden/libp2p-perf)

[@marten-seemann's libp2p perf test](https://github.com/marten-seemann/libp2p-perf-test/)

[@vyzo's libp2p perf test](https://github.com/vyzo/libp2p-perf-test/)