Add alternative - unfair but lower latency - send stream scheduling strategy #2002

alessandrod · 2024-10-07T14:12:28Z

Solana leader slots are 400ms. When sending and ingesting transactions, minimizing latency is therefore key. Quinn currently tries to implement fairness when writing out send streams, which is a good default, but not great for our use case.

We want to pack as many transactions in a datagram as possible, without any fragmentation if not at the very end if a transaction doesn't fit in the remaining space. In that case, we want the end (fin) of the transaction to come immediately after in order to minimize latency. The current round-robin algorithm doesn't allow this, and in fact leads to very high latency if the stream receive window is large.

This PR tries to address this problem. It introduces a TransportConfig::send_fairness(bool) config. When set to false, streams are still scheduled based on priority, but once a chunk of a stream has been written out, we'll try to complete the stream instead of trying to round-robin balance it among the streams with the same priority.

This gets rid of fragmentation, and effectively allows API clients to precisely control the order in which streams are written out.

Here's a server log without the patch:

[2024-10-07T13:51:43.960817939Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7460 offset=0 len=255 fin=true
[2024-10-07T13:51:43.960824640Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:43.960829630Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7461 offset=0 len=255 fin=true
[2024-10-07T13:51:43.960836460Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:43.960841420Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7462 offset=0 len=110 fin=false
[2024-10-07T13:51:43.960849560Z TRACE quinn_proto::connection] got Data packet (1452 bytes) from 127.0.0.1:8014 using id f7b061311008438f
[2024-10-07T13:51:43.960859210Z TRACE quinn_proto::connection] recv; space=Data pn=1258
[2024-10-07T13:51:43.960865050Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:43.960869920Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7463 offset=0 len=255 fin=true

[snip]

[2024-10-07T13:51:44.352021247Z TRACE quinn_proto::connection] recv; space=Data pn=7175
[2024-10-07T13:51:44.352024037Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352026447Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7420 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352029877Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352032267Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7426 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352036647Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352039017Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7432 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352042398Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352044778Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7438 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352048378Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352050758Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7444 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352055058Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352057518Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7450 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352061078Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352063438Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7456 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352067018Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352069528Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7462 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352073708Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T13:51:44.352076208Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 7468 offset=110 len=145 fin=true
[2024-10-07T13:51:44.352079558Z TRACE quinn_proto::connection] frame; ty=STREAM

Take a look at stream 7462. It starts at 2024-10-07T13:51:43.960 (pn=1257), and it's completed at 2024-10-07T13:51:44.352 (pn=7175) together with a bunch of other segmented transactions (note this is on localhost, so over the internet would be even worse).

Here's a log with the PR instead:

2024-10-07T14:06:53.171353122Z TRACE quinn_proto::connection] recv; space=Data pn=1894
[2024-10-07T14:06:53.171356272Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171358852Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10179 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171362262Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171365652Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10180 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171369062Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171371622Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10181 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171375192Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171378072Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10182 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171382332Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171384892Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10183 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171390133Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171392773Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10184 offset=0 len=110 fin=false
[2024-10-07T14:06:53.171397623Z TRACE quinn_proto::connection] got Data packet (1452 bytes) from 127.0.0.1:8004 using id 01dfdefc072f6a67
[2024-10-07T14:06:53.171401733Z TRACE quinn_proto::connection] recv; space=Data pn=1895
[2024-10-07T14:06:53.171404703Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171407263Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10184 offset=110 len=145 fin=true
[2024-10-07T14:06:53.171414553Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171417323Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10185 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171420713Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171423283Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10186 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171428433Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171432083Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10187 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171435713Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171438304Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10188 offset=0 len=255 fin=true
[2024-10-07T14:06:53.171441674Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171445274Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10189 offset=0 len=218 fin=false
[2024-10-07T14:06:53.171449364Z TRACE quinn_proto::connection] got Data packet (1452 bytes) from 127.0.0.1:8004 using id 01dfdefc072f6a67
[2024-10-07T14:06:53.171453354Z TRACE quinn_proto::connection] recv; space=Data pn=1896
[2024-10-07T14:06:53.171456324Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171460384Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10189 offset=218 len=37 fin=true
[2024-10-07T14:06:53.171464434Z TRACE quinn_proto::connection] frame; ty=STREAM
[2024-10-07T14:06:53.171467174Z TRACE quinn_proto::connection] got stream frame id=client unidirectional stream 10190 offset=0 len=255 fin=true

As you can see the fin=false transactions get completed immediately in the next packet.

quinn-proto/src/connection/streams/mod.rs

quinn-proto/src/connection/streams/state.rs

Ralith · 2024-10-09T01:38:08Z

This change seems well motivated, thanks!

I wonder if we could have the best of both worlds, and avoid yet another esoteric configuration knob, with a more heuristic approach. It sounds like your main problem is that, when sending many individually sub-MTU streams, streams that get split across packets are subject to a much higher maximum latency than you might otherwise see. What if we special-cased those? E.g. don't advance the round-robin state if:

a fragmented stream has less than an MTU of data remaining
alternatively, a fragmented stream was not the first stream in a packet

Either should let us retain fairness for larger-than-MTU streams, while reducing the maximum latency for sub-MTU streams, without requiring users to understand any of these concerns.

alessandrod · 2024-10-09T01:55:28Z

This change seems well motivated, thanks!

I wonder if we could have the best of both worlds, and avoid yet another esoteric configuration knob, with a more heuristic approach. It sounds like your main problem is that, when sending many individually sub-MTU streams, streams that get split across packets are subject to a much higher maximum latency than you might otherwise see. What if we special-cased those? E.g. don't advance the round-robin state if:
* a fragmented stream has less than an MTU of data remaining

* alternatively, a fragmented stream was not the first stream in a packet
Either should let us retain fairness for larger-than-MTU streams, while reducing the maximum latency for sub-MTU streams, without requiring users to understand any of these concerns.

For the general case, this sounds like a good idea, I'm happy to implement it if you want.

For our case specifically though, we have a proposal to 3x the max transaction size. We haven't really been able to even evaluate its feasibility so far because latency was already pretty bad, but once we deploy this change, I think we could reasonably enable 3x, therefore spanning multiple MTUs. In that case, I'd still want all datagrams of a tx to follow each other sequentially.

without requiring users to understand any of these concerns

Yeah this is a valid point ofc. How about I implement the heuristic to disable RR for < MTU, and then keep an option to always disable RR but instead of exposing it as a flag, I make it an opt-in feature flag?

alessandrod · 2024-10-09T01:59:06Z

keep an option to always disable RR but instead of exposing it as a flag, I make it an opt-in feature flag?

I can also always keep this out of tree ofc! I'd rather avoid having to have a vendor fork just for this tho.

djc · 2024-10-09T08:39:03Z

Heuristics sound good to me, but given the limited complexity of the additional configuration I feel like we could also accept the configuration upstream if it's well-motivated even in the presence of the heuristics.

On the other hand, maybe we should avoid adding heuristics if they don't address the only actual use case we've seen?

Ralith · 2024-10-11T01:44:52Z

maybe we should avoid adding heuristics if they don't address the only actual use case we've seen?

This is compelling. Heuristics are a greater maintenance burden, and if they're not actually addressing the motivating case, why pay that cost?

I'm happy to move ahead with a global config flag, as originally proposed. I do suspect there's a more flexible middle ground here somewhere (maybe the setting should be per-stream?) but we don't necessarily have to work that out here and now when there's a clear win on the table already.

quinn-proto/src/connection/streams/mod.rs

Add methods to PendingStreams to avoid accessing PendingStreams::streams directly.

This adds TransportConfig::send_fairness(bool). When set to false, streams are still scheduled based on priority, but once a chunk of a stream has been written out, we'll try to complete the stream instead of trying to round-robin balance it among the streams with the same priority. This reduces fragmentation, protocol overhead and stream receive latency when sending many small streams. It also sends same-priority streams in the order they are opened. This - assuming little to no network packet reordering - allows receivers to advertise a large stream window but keep a smaller, sliding receive window.

djc

Thanks!

Ralith

Beautiful, thanks!

Ralith · 2024-10-12T21:39:46Z

@lijunwangs this change may be of interest to you as well -- I recall you had similar concerns.

Before quinn-rs/quinn#2002 we could get streams fragmented and out of order (stream concurrency). Now streams always come in order, so there's no reason anymore to spawn multiple tasks to read them. Before we could have: [s1][s2][s3][s2 fin][s3 fin][s1 fin] So spawning multiple tasks led to overall faster ingestion, since to complete s1 we didn't have to waiy for all the other streams to arrive. Now we always have: [s1 fin][s2 fin][s3 fin] So there's no reason to spawn a task per stream: each task will be created, read all its stream's chunks, exit, before the next stream arrives. This change removes the per-stream task and instead uses the connection task to read all the streams. This removes the CPU cost of creating tasks and the corresponding memory allocations.

…3283) Before quinn-rs/quinn#2002 we could get streams fragmented and out of order (stream concurrency). Now streams always come in order, so there's no reason anymore to spawn multiple tasks to read them. Before we could have: [s1][s2][s3][s2 fin][s3 fin][s1 fin] So spawning multiple tasks led to overall faster ingestion, since to complete s1 we didn't have to waiy for all the other streams to arrive. Now we always have: [s1 fin][s2 fin][s3 fin] So there's no reason to spawn a task per stream: each task will be created, read all its stream's chunks, exit, before the next stream arrives. This change removes the per-stream task and instead uses the connection task to read all the streams. This removes the CPU cost of creating tasks and the corresponding memory allocations.

…nza-xyz#3283) Before quinn-rs/quinn#2002 we could get streams fragmented and out of order (stream concurrency). Now streams always come in order, so there's no reason anymore to spawn multiple tasks to read them. Before we could have: [s1][s2][s3][s2 fin][s3 fin][s1 fin] So spawning multiple tasks led to overall faster ingestion, since to complete s1 we didn't have to waiy for all the other streams to arrive. Now we always have: [s1 fin][s2 fin][s3 fin] So there's no reason to spawn a task per stream: each task will be created, read all its stream's chunks, exit, before the next stream arrives. This change removes the per-stream task and instead uses the connection task to read all the streams. This removes the CPU cost of creating tasks and the corresponding memory allocations.

alessandrod force-pushed the send-unfair branch 2 times, most recently from b3af7a4 to d09bc62 Compare October 7, 2024 14:18

veeso added a commit to veeso/solana-quinn that referenced this pull request Oct 8, 2024

feat: <quinn-rs#2002>

6d3bea1

Ralith reviewed Oct 9, 2024

View reviewed changes

quinn-proto/src/connection/streams/mod.rs Outdated Show resolved Hide resolved

quinn-proto/src/connection/streams/state.rs Outdated Show resolved Hide resolved

djc reviewed Oct 11, 2024

View reviewed changes

quinn-proto/src/connection/streams/mod.rs Outdated Show resolved Hide resolved

alessandrod force-pushed the send-unfair branch 2 times, most recently from 55d6cff to af41f52 Compare October 11, 2024 08:30

alessandrod added 2 commits October 11, 2024 08:40

PendingStreams: add missing internal API methods

64e69b9

Add methods to PendingStreams to avoid accessing PendingStreams::streams directly.

alessandrod force-pushed the send-unfair branch from af41f52 to adc4fa1 Compare October 11, 2024 08:50

djc approved these changes Oct 11, 2024

View reviewed changes

Ralith approved these changes Oct 12, 2024

View reviewed changes

Ralith added this pull request to the merge queue Oct 12, 2024

Merged via the queue into quinn-rs:main with commit 9d63e62 Oct 12, 2024
14 checks passed

alessandrod mentioned this pull request Oct 23, 2024

quic: don't create one task for each tx, create one per connection anza-xyz/agave#3283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add alternative - unfair but lower latency - send stream scheduling strategy #2002

Add alternative - unfair but lower latency - send stream scheduling strategy #2002

alessandrod commented Oct 7, 2024

Ralith commented Oct 9, 2024

alessandrod commented Oct 9, 2024

alessandrod commented Oct 9, 2024

djc commented Oct 9, 2024 •

edited

Loading

Ralith commented Oct 11, 2024

djc left a comment

Ralith left a comment

Ralith commented Oct 12, 2024

Add alternative - unfair but lower latency - send stream scheduling strategy #2002

Add alternative - unfair but lower latency - send stream scheduling strategy #2002

Conversation

alessandrod commented Oct 7, 2024

Ralith commented Oct 9, 2024

alessandrod commented Oct 9, 2024

alessandrod commented Oct 9, 2024

djc commented Oct 9, 2024 • edited Loading

Ralith commented Oct 11, 2024

djc left a comment

Choose a reason for hiding this comment

Ralith left a comment

Choose a reason for hiding this comment

Ralith commented Oct 12, 2024

djc commented Oct 9, 2024 •

edited

Loading