-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
udp: limit number of reads per event loop #16180
Conversation
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@@ -211,6 +211,10 @@ uint32_t ActiveQuicListener::destination(const Network::UdpRecvData& data) const | |||
return connection_id_snippet % concurrency_; | |||
} | |||
|
|||
size_t ActiveQuicListener::numReadsExpectedPerEventLoop() const { | |||
return quic_dispatcher_.NumSessions(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will result in too few packets being read when the number of active QUIC connections is small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change the lower bound in readPacketsFromSocket() to something larger than 1. What's a reasonable number for that? Google internal code also use 1 as lower bound actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that for parity with TCP you probably need this to be 32 to 100 per active QUIC connection in the small number of connections case. It's hard to say without loadtests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this number will *16 when using recvmmsg. If there is one connection, the event loop duration is 500us and the lower bound is 100 reads, the expected max bandwidth is 100 * 16 * 1400 bytes / 0.0005s = 4.17GB/s = 33Gbps. Would a connection have such send rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to make the packet read rate the same when recvmmsg is in-use / not-in-use
What are your thoughts in making these limits in number of packets and adjusting the number of calls in cases recvmmsg is used?
Regarding the math above, my concerned is about proxy behavior when event loop duration grows above 5ms, which would result in a decrease in the number of UDP packets that can be processed per second. Also, what are the more common QUIC packets that you expect to receive? I think they are ACKs and window updates which use smaller UDP packets; actual receive bps would be relatively small unless the proxy somehow is mostly receiving large POSTs from the clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACKs are most common packets if we are doing large download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that at least for downstreams downloads are more common than uploads.
Thoughts about making these limits configurable and accepting the same number of packets per wakeup for both recvmmsg and recvmsg ?
source/common/network/utility.h
Outdated
}; | ||
|
||
static const uint64_t DEFAULT_UDP_MAX_DATAGRAM_SIZE = 1500; | ||
static const uint64_t NUM_DATAGRAMS_PER_GRO_RECEIVE = 16; | ||
static const uint64_t NUM_DATAGRAMS_PER_MMSG_RECEIVE = 16; | ||
static const uint64_t MAX_NUM_READS_PER_EVENT_LOOP = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this limit configurable. In a way 100 seems too small, QUIC will be at a disadvantage against TCP; the whole QUIC subsystem will be allowed to read the equivalent of 1 to 3 HTTP2 connections per wakeup. When there's moderate load on the proxy, QUIC traffic will grind to a halt while HTTP2 will continue performing fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a reasonable number for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I would suggest running load tests that compare H2 and QUIC throughput when there are competing requests at the proxy. I would consider config parameters that specify the number of packets or MB of QUIC data to process per worker wakeup.
Another alternative to consider if having the limit grow in cases where the UDP code notices that it wasn't able to fully drain the socket. Or attempt to drain as many packets as they are in the UDP receive buffer at the start of the UDP receive loop.
Also consider the consequences of these limits:
Say that you have a UDP receive packet limit in the kernel of 10,000 packets but you only allow draining 100 per wakeup. As the duration of each wakeup increases, you start getting older and older packets in each subsequent wakeup until every packet you read ends up being so old that the client has given up by the time you start processing the packet. This can be a problem specially when handling DNS; a burst of attack DNS requests could lead to real requests backing up and failing for an extended period of time as you slowly drain old requests from the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand this, right now UDP floods can starve TCP, and if we flip this, UDP floods would starve UDP. I agree that this should scale with session load, but as QUIC is alpha I'd prefer it be affected by starvation, and have a TODO for scaling. This would need a relnote and a config or runtime override for non-QUIC UDP. I think basic scaling is a fixed number of packets (was it 16?) per QUIC session if you want to address it in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying that with this change, TCP will starve UDP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question please, do we have read limit for TCP connections? RawBufferSocket::doRead() doesn't seem to have it, but I'm not 100% sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, shouldDrainReadBuffer is how it's limited - stops when you hit the configured buffer limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand this, right now UDP floods can starve TCP, and if we flip this, UDP floods would starve UDP. I agree that this should scale with session load, but as QUIC is alpha I'd prefer it be affected by starvation, and have a TODO for scaling. This would need a relnote and a config or runtime override for non-QUIC UDP.
For non-QUIC, do we still want to drain the socket on each read event?
I think basic scaling is a fixed number of packets (was it 16?) per QUIC session if you want to address it in place.
If we are using recvmmsg, we get a multiplier of 16 from existing implementation. That's why numReadsExpectedPerEventLoop() just return session number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible question is how many hot fds we can get back from epoll_wait per wakeup. I expect the answer to be somewhere in the 16 to 256 range. For fairness between TCP and UDP on the upper side of that range with a TCP read size of 32kb per connection I think we should allow read of about 6k UDP packets per wakeup. I guess 100 attempts returning 16 packets each is only a factor of 4 away from that, so the threshold of 100 recvmmsg with each returning up to 16 packets is actually pretty good. I thought that the limits above resulted in 1 packet per active QUIC connection up to a limit of 100 packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic to limit number of packets read per loop and put upper limit to be 6000.
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@@ -180,8 +180,8 @@ class UdpProxyFilter : public Network::UdpListenerReadFilter, | |||
} | |||
size_t numPacketsExpectedPerEventLoop() const final { | |||
// Use 32k to read the same amount as a TCP connection. | |||
return 32u; | |||
} | |||
return 32u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I shouldn't make decision for UDP proxy how many packets to read. Will leave it as max size_t and TODO for @mattklein123
CI detected a compile failure:
|
Signed-off-by: Dan Zhang <danzh@google.com>
Still seeing compile errors. /wait
|
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at CI spell check error.
/wait
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though needs a main merge to land
Yeah, this PR should fix these two.
There is another data race in FakeUpstream needs to be addressed in the test helper function.
It didn't fully resolve the slowness as I retried to remove the TSAN timeout factor and extra upstream timeout, the tests still failed, though at a much lower rate. |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@lizan this is ready for API review |
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
/lgtm api |
Commit Message: To prevent long event loop when too many UDP packets are in the queue, limit how many packets to read in each event loop. If haven't finished reading, artifacts a READ event to continue in the next event loop. Additional Description: Add numPacketsExpectedPerEventLoop() callback to UdpListenerCallback, so that QUIC listener can tell how many packets it wants to read in each loop. The actually number of packets read are still bound by MAX_NUM_PACKETS_PER_EVENT_LOOP (6000). Quic listener returns numPacketsExpectedPerEventLoop() based on number of connections it has at the moment and the configured envoy::config::listener::QuicProtocolOptions.packets_to_read_to_connection_count_ratio. Made InjectableSingleton really thread safe. Risk Level: medium, other than quic listener, other UdpListenerCallbacks return max size_t for numPacketsExpectedPerEventLoop(). This will cause those callbacks to read 6000 packets per READ event. Testing: added udp listener unit tests. Fixes envoyproxy#16335 envoyproxy#16278 Part of envoyproxy#16198 envoyproxy#16493 Signed-off-by: Dan Zhang <danzh@google.com>
Commit Message: To prevent long event loop when too many UDP packets are in the queue, limit how many packets to read in each event loop. If haven't finished reading, artifacts a READ event to continue in the next event loop.
Additional Description:
Add numPacketsExpectedPerEventLoop() callback to UdpListenerCallback, so that QUIC listener can tell how many packets it wants to read in each loop. The actually number of packets read are still bound by MAX_NUM_PACKETS_PER_EVENT_LOOP (6000).
Quic listener returns numPacketsExpectedPerEventLoop() based on number of connections it has at the moment and the configured envoy::config::listener::QuicProtocolOptions.packets_to_read_to_connection_count_ratio.
Made InjectableSingleton really thread safe.
Risk Level: medium, other than quic listener, other UdpListenerCallbacks return max size_t for numPacketsExpectedPerEventLoop(). This will cause those callbacks to read 6000 packets per READ event.
Testing: added udp listener unit tests.
Fixes #16335 #16278
Part of #16198 #16493