connection: adding watermarks to the read buffer. #11170

alyssawilk · 2020-05-12T20:24:12Z

Fixing an issue where every time a connection was readDisabled/readEnabled it would read from the socket, even if the buffer already contained sufficient data it should have triggered push back.

Risk Level: high (watermarks)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a
Fixes https://github.com/envoyproxy/envoy-setec/issues/90 (no longer a security risk after previous mitigation)

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

antoniovicente

Super excited about this change to extend watermark buffer logic to connection read buffers. Thanks for putting this together.

antoniovicente · 2020-05-12T21:01:35Z

source/common/buffer/watermark_buffer.h

@@ -38,6 +38,7 @@ class WatermarkBuffer : public OwnedImpl {
  void setWatermarks(uint32_t watermark) { setWatermarks(watermark / 2, watermark); }
  void setWatermarks(uint32_t low_watermark, uint32_t high_watermark);
  uint32_t highWatermark() const { return high_watermark_; }
+  bool aboveHighWatermark() const { return above_high_watermark_called_; }


nit: This returns true if the high watermark is triggered and the buffer size hasn't dropped below the low watermark.

Could you add a comment for this function?

Would highWatermarkTriggered be a better name?

antoniovicente · 2020-05-12T21:28:31Z

source/common/network/connection_impl.cc

+    if (consumerWantsToRead() && read_buffer_.length() > 0) {
+      // If the connection has data buffered there's no guarantee there's also data in the kernel
+      // which will kick off the filter chain. Instead fake an event to make sure the buffered data
+      // gets processed regardless and ensure that we dispatch it via onRead.


It seems that this fake read event wakeup is delivered even in cases where the file_event does not have the Read mask set. Worth expanding this comment?

antoniovicente · 2020-05-12T21:31:07Z

source/common/network/connection_impl.cc

+
+void ConnectionImpl::onReadBufferHighWatermark() {
+  ENVOY_CONN_LOG(debug, "onAboveReadBufferHighWatermark", *this);
+  readDisable(true);


"if (state() == State::Open) {" seems necessary here also.

I'm thinking about upstream responses over HTTPS where the last byte of the response read, EOF is detected, then SslSocket commits the reservation and triggers the high-watermark. (SslSocket is special; plaintext socket does 1 commit per readv call, while SslSocket does 1 commit per group of SSL_read calls used to fill the current reservation).

I'm happy to add a guard (you know how pro-error-handling I am) but I don't think it's necessary.
In the case we do SSL_read and pick up the fin, I believe we will do multiple reads, but in SslSocket::doRead we commit all the data (buffer goes high) and then return the action PostIoAction::Close, so retaining the invariant that we close after the read is done, and shouldn't add data to the connection after the socket has been closed.

I'm ok skipping the check.

antoniovicente · 2020-05-12T21:41:07Z

source/common/network/connection_impl.cc

+  if (read_disable_count_ != 0) {
+    if (dispatch_buffered_data_ && consumerWantsToRead()) {
+      onRead(read_buffer_.length());
+      dispatch_buffered_data_ = false;


I think you want to clear dispatch_buffered_data_ before calling onRead.

onRead may call readDisable(false) to re-enable as data is consumed and read buffer falls below low watermark. readDisable will set dispatch_buffered_data_ = true a second time, and this will then set it back to false resulting in a failed resumption if the fd read attempt after resumption returns 0 bytes but there are bytes in the read buffer.

The current "dispatch_buffered_data_ = false;" has a similar bug which may only trigger if you send 2 pipelined HTTP/1.1 GET requests, we do readDisable(true) after reading headers for the first request, internal error response would terminate the first request and trigger readDisable(false) which would set dispatch_buffered_data_ and fd ready for resumption, but resumption would be ignored because dispatch_buffered_data_ is unset when onRead returns.

Oh, that's a really nice catch.
Looking closely at the code, I was replicating the buggy logic from down below (583) where the same thing could happen on the actual socket read path. I was going to bump both calls above the onRead, but then the lower case doesn't clear dispatch_buffered_data consistently, so I think I'm just going to latch-and-clear at the top of the function to attempt to future-proof.

antoniovicente · 2020-05-12T21:58:25Z

test/common/network/connection_impl_test.cc

+          dispatcher_->exit();
+          return FilterStatus::StopIteration;
+        }));
+    dispatcher_->run(Event::Dispatcher::RunType::Block);


Worth doing a second server_connection write before doing drains, and verify that there is no read from the client connection on dispatcher_->run()?

antoniovicente · 2020-05-12T22:01:15Z

test/common/network/connection_impl_test.cc

+  {
+    client_connection_->readDisable(true);
+    testClientConnection()->readBuffer().drain(3);
+    EXPECT_FALSE(testClientConnection()->readBuffer().aboveHighWatermark());


Worth adding some consumerWantsToRead and dispatch_buffered_data_ expectations?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

antoniovicente

Did another pass and the changes look good. You may want to reply to and close some of the open comment threads.

mattklein123

Nice, makes sense to me. Just a few small comments. Thank you!

/wait

mattklein123 · 2020-05-20T16:00:34Z

source/common/network/connection_impl.h

@@ -122,11 +122,19 @@ class ConnectionImpl : public ConnectionImplBase, public TransportSocketCallback
  static uint64_t nextGlobalIdForTest() { return next_global_id_; }

 protected:
+  // A convenience function which returns true if
+  // 1) The read disable count is zero or
+  // 2) The read disable count is one, due to the read buffer being overrun.


Isn't there a chance that the read disable count is > 1 and the buffer is overrun? Like if it got overrun but also disabled some other way? Or is the idea that if it is > 1 we will wait until the only reason is it's disabled due to read overrun? If that's the case can you add more comments just to make it completely clear?

mattklein123 · 2020-05-20T16:03:32Z

source/common/network/connection_impl.cc

@@ -342,24 +350,24 @@ void ConnectionImpl::readDisable(bool disable) {
    }
  } else {
    --read_disable_count_;


While you are here can you ASSERT this is greater than 0 before decrementing? I'm surprised this was not already asserted.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

mattklein123

Thanks!

alyssawilk added 2 commits May 11, 2020 12:35

network: do not do infinite reads

1bdbb5d

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

connection: adding limits to the read buffer.

a78eff4

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk assigned antoniovicente May 12, 2020

antoniovicente reviewed May 12, 2020

View reviewed changes

alyssawilk added 3 commits May 13, 2020 09:49

tidy

8b87dfa

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

avd gets reviewer gold star of the day

ca61e93

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

tidy

59a0bf5

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

antoniovicente previously approved these changes May 19, 2020

View reviewed changes

alyssawilk assigned mattklein123 May 20, 2020

mattklein123 requested changes May 20, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label May 20, 2020

reviewer comments

6aafcb2

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>

alyssawilk dismissed antoniovicente’s stale review via 6aafcb2 May 20, 2020 17:48

repokitteh-read-only bot removed the waiting label May 20, 2020

mattklein123 approved these changes May 20, 2020

View reviewed changes

mattklein123 merged commit 77cca6b into envoyproxy:master May 20, 2020

alyssawilk deleted the final_watermark branch August 27, 2020 16:33

antoniovicente mentioned this pull request Oct 28, 2020

connection: Remember transport socket read resumption requests and replay them when re-enabling read. #13772

Merged

antoniovicente mentioned this pull request Nov 18, 2020

[Level Events] manage level events registration mask #13787

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connection: adding watermarks to the read buffer. #11170

connection: adding watermarks to the read buffer. #11170

alyssawilk commented May 12, 2020 •

edited

Loading

antoniovicente left a comment

antoniovicente May 12, 2020

antoniovicente May 12, 2020

antoniovicente May 12, 2020

alyssawilk May 14, 2020

antoniovicente May 14, 2020

antoniovicente May 12, 2020

alyssawilk May 14, 2020

antoniovicente May 12, 2020

antoniovicente May 12, 2020

antoniovicente left a comment

mattklein123 left a comment

mattklein123 May 20, 2020

mattklein123 May 20, 2020

mattklein123 left a comment

connection: adding watermarks to the read buffer. #11170

connection: adding watermarks to the read buffer. #11170

Conversation

alyssawilk commented May 12, 2020 • edited Loading

antoniovicente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoniovicente left a comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

alyssawilk commented May 12, 2020 •

edited

Loading