Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udp: limit number of reads per event loop #16180

Merged
merged 34 commits into from
May 24, 2021

Conversation

danzh2010
Copy link
Contributor

@danzh2010 danzh2010 commented Apr 26, 2021

Commit Message: To prevent long event loop when too many UDP packets are in the queue, limit how many packets to read in each event loop. If haven't finished reading, artifacts a READ event to continue in the next event loop.
Additional Description:
Add numPacketsExpectedPerEventLoop() callback to UdpListenerCallback, so that QUIC listener can tell how many packets it wants to read in each loop. The actually number of packets read are still bound by MAX_NUM_PACKETS_PER_EVENT_LOOP (6000).
Quic listener returns numPacketsExpectedPerEventLoop() based on number of connections it has at the moment and the configured envoy::config::listener::QuicProtocolOptions.packets_to_read_to_connection_count_ratio.
Made InjectableSingleton really thread safe.

Risk Level: medium, other than quic listener, other UdpListenerCallbacks return max size_t for numPacketsExpectedPerEventLoop(). This will cause those callbacks to read 6000 packets per READ event.
Testing: added udp listener unit tests.

Fixes #16335 #16278
Part of #16198 #16493

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
include/envoy/network/listener.h Outdated Show resolved Hide resolved
@@ -211,6 +211,10 @@ uint32_t ActiveQuicListener::destination(const Network::UdpRecvData& data) const
return connection_id_snippet % concurrency_;
}

size_t ActiveQuicListener::numReadsExpectedPerEventLoop() const {
return quic_dispatcher_.NumSessions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will result in too few packets being read when the number of active QUIC connections is small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the lower bound in readPacketsFromSocket() to something larger than 1. What's a reasonable number for that? Google internal code also use 1 as lower bound actually.

Copy link
Contributor

@antoniovicente antoniovicente Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that for parity with TCP you probably need this to be 32 to 100 per active QUIC connection in the small number of connections case. It's hard to say without loadtests.

Copy link
Contributor Author

@danzh2010 danzh2010 Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that this number will *16 when using recvmmsg. If there is one connection, the event loop duration is 500us and the lower bound is 100 reads, the expected max bandwidth is 100 * 16 * 1400 bytes / 0.0005s = 4.17GB/s = 33Gbps. Would a connection have such send rate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to make the packet read rate the same when recvmmsg is in-use / not-in-use

What are your thoughts in making these limits in number of packets and adjusting the number of calls in cases recvmmsg is used?

Regarding the math above, my concerned is about proxy behavior when event loop duration grows above 5ms, which would result in a decrease in the number of UDP packets that can be processed per second. Also, what are the more common QUIC packets that you expect to receive? I think they are ACKs and window updates which use smaller UDP packets; actual receive bps would be relatively small unless the proxy somehow is mostly receiving large POSTs from the clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACKs are most common packets if we are doing large download.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that at least for downstreams downloads are more common than uploads.

Thoughts about making these limits configurable and accepting the same number of packets per wakeup for both recvmmsg and recvmsg ?

};

static const uint64_t DEFAULT_UDP_MAX_DATAGRAM_SIZE = 1500;
static const uint64_t NUM_DATAGRAMS_PER_GRO_RECEIVE = 16;
static const uint64_t NUM_DATAGRAMS_PER_MMSG_RECEIVE = 16;
static const uint64_t MAX_NUM_READS_PER_EVENT_LOOP = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this limit configurable. In a way 100 seems too small, QUIC will be at a disadvantage against TCP; the whole QUIC subsystem will be allowed to read the equivalent of 1 to 3 HTTP2 connections per wakeup. When there's moderate load on the proxy, QUIC traffic will grind to a halt while HTTP2 will continue performing fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a reasonable number for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I would suggest running load tests that compare H2 and QUIC throughput when there are competing requests at the proxy. I would consider config parameters that specify the number of packets or MB of QUIC data to process per worker wakeup.

Another alternative to consider if having the limit grow in cases where the UDP code notices that it wasn't able to fully drain the socket. Or attempt to drain as many packets as they are in the UDP receive buffer at the start of the UDP receive loop.

Also consider the consequences of these limits:
Say that you have a UDP receive packet limit in the kernel of 10,000 packets but you only allow draining 100 per wakeup. As the duration of each wakeup increases, you start getting older and older packets in each subsequent wakeup until every packet you read ends up being so old that the client has given up by the time you start processing the packet. This can be a problem specially when handling DNS; a burst of attack DNS requests could lead to real requests backing up and failing for an extended period of time as you slowly drain old requests from the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this, right now UDP floods can starve TCP, and if we flip this, UDP floods would starve UDP. I agree that this should scale with session load, but as QUIC is alpha I'd prefer it be affected by starvation, and have a TODO for scaling. This would need a relnote and a config or runtime override for non-QUIC UDP. I think basic scaling is a fixed number of packets (was it 16?) per QUIC session if you want to address it in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that with this change, TCP will starve UDP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question please, do we have read limit for TCP connections? RawBufferSocket::doRead() doesn't seem to have it, but I'm not 100% sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, shouldDrainReadBuffer is how it's limited - stops when you hit the configured buffer limit

Copy link
Contributor Author

@danzh2010 danzh2010 Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this, right now UDP floods can starve TCP, and if we flip this, UDP floods would starve UDP. I agree that this should scale with session load, but as QUIC is alpha I'd prefer it be affected by starvation, and have a TODO for scaling. This would need a relnote and a config or runtime override for non-QUIC UDP.

For non-QUIC, do we still want to drain the socket on each read event?

I think basic scaling is a fixed number of packets (was it 16?) per QUIC session if you want to address it in place.

If we are using recvmmsg, we get a multiplier of 16 from existing implementation. That's why numReadsExpectedPerEventLoop() just return session number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible question is how many hot fds we can get back from epoll_wait per wakeup. I expect the answer to be somewhere in the 16 to 256 range. For fairness between TCP and UDP on the upper side of that range with a TCP read size of 32kb per connection I think we should allow read of about 6k UDP packets per wakeup. I guess 100 attempts returning 16 packets each is only a factor of 4 away from that, so the threshold of 100 recvmmsg with each returning up to 16 packets is actually pretty good. I thought that the limits above resulted in 1 packet per active QUIC connection up to a limit of 100 packets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic to limit number of packets read per loop and put upper limit to be 6000.

danzh1989 added 4 commits May 2, 2021 18:39
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010 danzh2010 requested a review from mattklein123 as a code owner May 3, 2021 18:12
Signed-off-by: Dan Zhang <danzh@google.com>
@@ -180,8 +180,8 @@ class UdpProxyFilter : public Network::UdpListenerReadFilter,
}
size_t numPacketsExpectedPerEventLoop() const final {
// Use 32k to read the same amount as a TCP connection.
return 32u;
}
return 32u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I shouldn't make decision for UDP proxy how many packets to read. Will leave it as max size_t and TODO for @mattklein123

@antoniovicente
Copy link
Contributor

CI detected a compile failure:

2021-05-03T21:18:48.7400427Z bazel-out/k8-opt/bin/source/common/quic/_virtual_includes/active_quic_listener_lib/common/quic/active_quic_listener.h:55:10: error: declaration of 'numPacketsExpectedPerEventLoop' overrides a 'final' function
2021-05-03T21:18:48.7401278Z   size_t numPacketsExpectedPerEventLoop() const override;
2021-05-03T21:18:48.7401756Z          ^
2021-05-03T21:18:48.7402670Z bazel-out/k8-opt/bin/source/server/_virtual_includes/active_udp_listener/server/active_udp_listener.h:40:10: note: overridden virtual function is here
2021-05-03T21:18:48.7403415Z   size_t numPacketsExpectedPerEventLoop() const final {
2021-05-03T21:18:48.7421776Z          ^
2021-05-03T21:18:48.7422316Z 1 error generated.

Signed-off-by: Dan Zhang <danzh@google.com>
@antoniovicente
Copy link
Contributor

Still seeing compile errors.

/wait

opt/bin/include/envoy/server/overload/_virtual_includes/thread_local_test/common/network/udp_fuzz.cc:68:30: error: variable type 'Envoy::(anonymous namespace)::FuzzUdpListenerCallbacks' is an abstract class
2021-05-05T16:48:52.3387670Z     FuzzUdpListenerCallbacks fuzzCallbacks(this);
2021-05-05T16:48:52.3388166Z                              ^
2021-05-05T16:48:52.3389409Z bazel-out/k8-opt/bin/include/envoy/network/_virtual_includes/listener_interface/envoy/network/listener.h:337:18: note: unimplemented pure virtual method 'numPacketsExpectedPerEventLoop' in 'FuzzUdpListenerCallbacks'
2021-05-05T16:48:52.3390197Z   virtual size_t numPacketsExpectedPerEventLoop() const PURE;
2021-05-05T16:48:52.3390618Z                  ^
2021-05-05T16:48:52.3391158Z 1 error generated.

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #16180 was synchronize by danzh2010.

see: more, trace.

Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at CI spell check error.

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
danzh1989 added 6 commits May 18, 2021 00:14
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
alyssawilk
alyssawilk previously approved these changes May 19, 2021
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though needs a main merge to land

@alyssawilk
Copy link
Contributor

also if this solves #16335 does it also fix any of
#16278 (flaky quic_http_integration_test)
#16493 (flaky QuicHttpIntegrationTest)
#16198 (slow tests)
basically I wonder if the others were caused by looping/starvation?

@danzh2010
Copy link
Contributor Author

also if this solves #16335 does it also fix any of
#16278 (flaky quic_http_integration_test)

Yeah, this PR should fix these two.

#16493 (flaky QuicHttpIntegrationTest)

There is another data race in FakeUpstream needs to be addressed in the test helper function.

#16198 (slow tests)
basically I wonder if the others were caused by looping/starvation?

It didn't fully resolve the slowness as I retried to remove the TSAN timeout factor and extra upstream timeout, the tests still failed, though at a much lower rate.

danzh1989 added 2 commits May 19, 2021 12:21
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
alyssawilk
alyssawilk previously approved these changes May 20, 2021
@alyssawilk
Copy link
Contributor

@lizan this is ready for API review

danzh1989 added 2 commits May 20, 2021 10:39
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@adisuissa
Copy link
Contributor

/lgtm api

@alyssawilk alyssawilk merged commit 8db01a4 into envoyproxy:main May 24, 2021
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Commit Message: To prevent long event loop when too many UDP packets are in the queue, limit how many packets to read in each event loop. If haven't finished reading, artifacts a READ event to continue in the next event loop.
Additional Description:
Add numPacketsExpectedPerEventLoop() callback to UdpListenerCallback, so that QUIC listener can tell how many packets it wants to read in each loop. The actually number of packets read are still bound by MAX_NUM_PACKETS_PER_EVENT_LOOP (6000).
Quic listener returns numPacketsExpectedPerEventLoop() based on number of connections it has at the moment and the configured envoy::config::listener::QuicProtocolOptions.packets_to_read_to_connection_count_ratio.
Made InjectableSingleton really thread safe.

Risk Level: medium, other than quic listener, other UdpListenerCallbacks return max size_t for numPacketsExpectedPerEventLoop(). This will cause those callbacks to read 6000 packets per READ event.
Testing: added udp listener unit tests.

Fixes envoyproxy#16335 envoyproxy#16278
Part of envoyproxy#16198 envoyproxy#16493
Signed-off-by: Dan Zhang <danzh@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants