Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections initially have a bad RTT, take too long to settle down. #2176

Closed
flub opened this issue Apr 11, 2024 · 9 comments
Closed

Connections initially have a bad RTT, take too long to settle down. #2176

flub opened this issue Apr 11, 2024 · 9 comments
Assignees
Labels
bug Something isn't working c-iroh-net
Milestone

Comments

@flub
Copy link
Contributor

flub commented Apr 11, 2024

On high-latency connections we have observed a pretty bad initial round-trip time (RTT) time, as estimated by Quinn. This appears not to happen (as much) on connections with low latency. It also appears to mostly affect the connecting side, likely the side with the longest distance to the relay server.

This is likely affected by and made worse by #2169. Though going into mixed sending does not affect the RTT much, probably only gets worse when switching fully back to the relayed connection.

@flub flub added bug Something isn't working c-iroh-net labels Apr 11, 2024
@flub flub self-assigned this Apr 11, 2024
@flub flub added this to iroh Apr 11, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in iroh Apr 11, 2024
@flub flub moved this from 📋 Backlog to 🏗 In progress in iroh Apr 11, 2024
@link2xt
Copy link
Contributor

link2xt commented Apr 11, 2024

Can RTT estimator be trusted?
Initial RTT is set here, 333 ms:
https://github.com/quinn-rs/quinn/blob/88f48b0179f358cea0b76fc550e629007bfc957d/quinn-proto/src/config.rs#L329
EDIT: this initial RTT does not matter, it never appears in the logs. Probably only used for initial RTO calculation.

Then calculated by RttEstimator.

Maybe extract min_rtt as well at least, could be this RTT estimator built for purposes of RTO calculation is too conservative and actual RTT is smaller than this.

On a first glance looks like simple exponential averaging, but I have not looked into where ack_delay and rtt raw measurements come from:
https://github.com/quinn-rs/quinn/blob/88f48b0179f358cea0b76fc550e629007bfc957d/quinn-proto/src/connection/paths.rs#L177

@link2xt
Copy link
Contributor

link2xt commented Apr 11, 2024

ack_delay is something QUIC-specific (not present in TCP), I think they are sending the time ACK was delayed on the sender so it can be subtracted on the receiver.

But generally if you see 600 ms delay it means raw rtt > 333 ms was injected here which is already bad by itself:
https://github.com/quinn-rs/quinn/blob/948025bd489be6a4aabcfc9c83629368e685771e/quinn-proto/src/connection/mod.rs#L1261
(this URL is for 0.10.4 tag)

@link2xt
Copy link
Contributor

link2xt commented Apr 12, 2024

I have checked out flub/messing-around-with-rtt branch, cloned https://github.com/quinn-rs/quinn/ branch 0.10.x and added this to workspace Cargo.toml:

[patch.crates-io]
quinn-proto = { path = "../quinn/quinn-proto" }

Then patched quinn with this:

diff --git a/quinn-proto/src/connection/mod.rs b/quinn-proto/src/connection/mod.rs
index 7e93d9d1..7d7693f4 100644
--- a/quinn-proto/src/connection/mod.rs
+++ b/quinn-proto/src/connection/mod.rs
@@ -1258,6 +1258,7 @@ impl Connection {
                 )
             };
             let rtt = instant_saturating_sub(now, self.spaces[space].largest_acked_packet_sent);
+            println!("RAW RTT: {:?}", rtt);
             self.path.rtt.update(ack_delay, rtt);
             if self.path.first_packet_after_rtt_sample.is_none() {
                 self.path.first_packet_after_rtt_sample =

On the accept side cargo run -- doctor accept prints values that look fine:

RAW RTT: 6.876413ms
RAW RTT: 6.876413ms
RAW RTT: 5.261827ms
RAW RTT: 911.356µs
Accepted connection from XXXX
RAW RTT: 1.047686ms
RAW RTT: 1.268663ms
RAW RTT: 1.456326ms
RAW RTT: 1.402479ms
RAW RTT: 1.673532ms
RAW RTT: 1.566466ms
RAW RTT: 1.339482ms
RAW RTT: 1.332219ms
RAW RTT: 1.338086ms
RAW RTT: 1.33592ms
RAW RTT: 1.421546ms
RAW RTT: 1.333126ms

On the connect side raw values look like this:

RAW RTT: 399.083612ms
RAW RTT: 2.552762ms
RAW RTT: 593.161µs
RAW RTT: 1.324676ms
RAW RTT: 1.347164ms
RAW RTT: 956.893µs
RAW RTT: 974.353µs
RAW RTT: 978.264µs
RAW RTT: 1.118924ms
RAW RTT: 990.835µs
RAW RTT: 1.042937ms
RAW RTT: 1.199939ms
RAW RTT: 1.154334ms
RAW RTT: 1.013324ms
RAW RTT: 1.044403ms
RAW RTT: 1.142461ms
RAW RTT: 1.071642ms
RAW RTT: 1.121788ms
RAW RTT: 1.061934ms
RAW RTT: 1.001311ms
RAW RTT: 1.043705ms

Note the first raw value of 399 ms.
So I suspect it is a bug in quinn-proto, it is feeding some incorrect measurement as the first value on the connect side.
Or could be that it actually took that long to ack the packet for the accepting side, maybe it did some crypto work there?

ack_delay is by the way exactly 0, at least at first, so it does not matter.

@link2xt
Copy link
Contributor

link2xt commented Apr 12, 2024

Maybe ack_delay being zero is a problem.
Especially the section https://www.rfc-editor.org/rfc/rfc9000.html#name-measuring-and-reporting-hos is interesting.

When the measured acknowledgment delay is larger than its max_ack_delay, an endpoint SHOULD report the measured delay. This information is especially useful during the handshake when delays might be large;

Delay indeed seems to be large, but encoded ack delay is exactly zero (zero microseconds!). Accepter should measure the time it took to ack first frame and encode it into ack delay so connecting side can subtract it from RTT measurement.

Here ack_delay is exactly zero, printed out as 0ns:
https://github.com/quinn-rs/quinn/blob/db2df614fcab3e9c17b3e5f325eb197920489779/quinn-proto/src/connection/mod.rs#L2996

@link2xt
Copy link
Contributor

link2xt commented Apr 12, 2024

Ok, apparently on 0.10.x ack delay is simply not implemented:
https://github.com/quinn-rs/quinn/blob/db2df614fcab3e9c17b3e5f325eb197920489779/quinn-proto/src/connection/spaces.rs#L421-L426

It was only implemented in quinn-rs/quinn#1553 that is not part of the 0.10.x branch.

400 ms of TLS crypto also does not sound good, here is a blog post claiming the whole TLS handshake took 220 ms in 2009: https://www.moserware.com/2009/06/first-few-milliseconds-of-https.html

@dignifiedquire dignifiedquire added this to the v0.15.0 milestone Apr 15, 2024
@flub
Copy link
Contributor Author

flub commented Apr 19, 2024

Note the first raw value of 399 ms. So I suspect it is a bug in quinn-proto, it is feeding some incorrect measurement as the first value on the connect side. Or could be that it actually took that long to ack the packet for the accepting side, maybe it did some crypto work there?

ack_delay is by the way exactly 0, at least at first, so it does not matter.

From what I understand it is normal that initially the RTTs are rather large in QUIC. There is something about the crypto not yet being setup and the ACKs having to wait before they can be sent. IIRC there's even something about ignoring ack_delay for some of the very early RTT estimates during the handshake. So I don't think this is of particular concern.

@flub
Copy link
Contributor Author

flub commented Apr 19, 2024

I have hacked something together which allowed me to reset the congestion controller and rtt estimator state every time the magic socket switches path (on the flub/messing-around-with-rtt branch, includes a forked quinn). The results are promising in that it settles down much faster, about 4s for the worst case, often much faster (from about 20s for the worst case). How fast it settles is still dependent on the distance between the two nodes: the further apart the longer it takes to reach the steady state.

I'll now try to figure out how to do this in a nice and reasonable way that supports more than one connection. In any case we'll end up with a forked Quinn I think.

@link2xt
Copy link
Contributor

link2xt commented Apr 20, 2024

From what I understand it is normal that initially the RTTs are rather large in QUIC. There is something about the crypto not yet being setup and the ACKs having to wait before they can be sent. IIRC there's even something about ignoring ack_delay for some of the very early RTT estimates during the handshake. So I don't think this is of particular concern.

There is this quote from RFC 9002:
"The peer might report acknowledgment delays that are larger than the peer's max_ack_delay during the handshake (Section 13.2.1 of [QUIC-TRANSPORT]). To account for this, the endpoint SHOULD ignore max_ack_delay until the handshake is confirmed, as defined in Section 4.1.2 of [QUIC-TLS]. When they occur, these large acknowledgment delays are likely to be non-repeating and limited to the handshake. The endpoint can therefore use them without limiting them to the max_ack_delay, avoiding unnecessary inflation of the RTT estimate."

It means that before handshake peer may report large "ack_delay" and it should be subtracted from RTT measurement as is without capping it to max_ack_delay. There is also this pseudocode:

if (handshake confirmed):
  ack_delay = min(ack_delay, max_ack_delay)

But currently used quinn 0.10 does always report exactly 0 in ack_delay and this results in inflated RTT. Properly fixing this initial RTT estimate requires upgrading to 0.11 which includes quinn-rs/quinn#1553 or backporting ack_delay implementation.

@flub
Copy link
Contributor Author

flub commented May 10, 2024

Closed by #2234

@flub flub closed this as completed May 10, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working c-iroh-net
Projects
Archived in project
Development

No branches or pull requests

3 participants