Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener: reset the file event when destroying listener filters #16952

Merged
merged 14 commits into from
Jun 22, 2021

Conversation

soulxu
Copy link
Member

@soulxu soulxu commented Jun 12, 2021

Commit Message: Listener: reset the file event before initializing new one
Additional Description:
The listener filter may add event on the new socket, but it won't cleanup the event.
If continue_on_listener_filters_timeout is set, the new connection may add same event
to the socket. So reset the file event before initialize new one.
Risk Level: low
Testing: integration test added
Docs Changes: n/a
Release Notes: n/a
Fixes #16951

Signed-off-by: He Jie Xu hejie.xu@intel.com

Signed-off-by: He Jie Xu <hejie.xu@intel.com>
@antoniovicente antoniovicente self-assigned this Jun 12, 2021
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
// the same file descriptor. This is not allowed.
if (file_event_ != nullptr) {
file_event_.reset();
}
Copy link
Member Author

@soulxu soulxu Jun 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another fix will be to reset the event in the destructor of the listener filter, but that would require every listener filter to take care of it. That is why I change here.

Also, this fix doesn't prevent initiate two instances of FileEventImpl directly, it is how the bug istio/istio#18229 was trigger initially, but I searched the code, we don't have that case anymore. Since we create FileEventImpl only through the dispatcher. So maybe we can move the constructor of FileEventImpl as protected, and only dispatcher can instantiate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about file_event_ referring to a Event::FileReadyCb cb that points to a deleted listener filter. Given that, I think it is important that we find a way to reset the file event when the listen filter is destroyed, rather than hack it this way. Who owns the listener IoSocketHandleImpl?

I think that the 2 listener filters that seem to be calling initializeFileEvent are tls_inspector and http_inspector. Is the issue here that those filters need events during some parts of the connection lifetime, but eventually need to remove themselves from accepting events directly?

It would be useful to know more about when this ASSERT fails, and possibly change this from ASSERT to RELEASE_ASSERT. Do the added tests cause this ASSERT to be hit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option that could be considered for cases like this one is to add a RAII object that can be used for filters like this one to hold the file_event registration in cases where the filter does own a the file event registration.

Also, I think that falling back to the default filter chain in cases where tls inspector times out may not be the right thing. If the client is not sending a handshake within a reasonable time, the proxy should just close the connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens only on continue_on_listener_filters_timeout_, would resetting the file_event in ActiveTcpSocket::onTimeout() solve the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because ActiveTcpSocket doesn't know who owns the relevant connection socket. The filters need to properly manage the file events they own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code of TLS inspector and I think I understand the issue better. I have concerns about the TLS inspector filter effectively removing itself from the filter chain on timeout. This behavior allows for trivial bypass of the TLS inspector and HTTP inspector functionality by waiting for a few seconds before sending the rest of the client hello. Are these inspectors purely debug hooks or are they meant to provide specific functionality or security properties?

Thanks for confirming that your new tests repo the ASSERT failure. I suggest we try to fix this issue by calling resetFileEvent in TLS inspector and HTTP inspector destructors if the filters haven't invoked their respective done callbacks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code of TLS inspector and I think I understand the issue better. I have concerns about the TLS inspector filter effectively removing itself from the filter chain on timeout. This behavior allows for trivial bypass of the TLS inspector and HTTP inspector functionality by waiting for a few seconds before sending the rest of the client hello. Are these inspectors purely debug hooks or are they meant to provide specific functionality or security properties?

Those inspectors are meant to provider specific functionality, for example, tls inspect get the SNI for later the filter chain matching.

Thanks for confirming that your new tests repo the ASSERT failure. I suggest we try to fix this issue by calling resetFileEvent in TLS inspector and HTTP inspector destructors if the filters haven't invoked their respective done callbacks.

got it, thanks for the review!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code of TLS inspector and I think I understand the issue better. I have concerns about the TLS inspector filter effectively removing itself from the filter chain on timeout. This behavior allows for trivial bypass of the TLS inspector and HTTP inspector functionality by waiting for a few seconds before sending the rest of the client hello. Are these inspectors purely debug hooks or are they meant to provide specific functionality or security properties?

Those inspectors are meant to provider specific functionality, for example, tls inspect get the SNI for later the filter chain matching.

This seems problematic since the SNI extraction logic of the filter is trivially bypass-able by waiting for a timeout condition before sening the SNI. Is there potential for that to cause any security concerns?

Thanks for confirming that your new tests repo the ASSERT failure. I suggest we try to fix this issue by calling resetFileEvent in TLS inspector and HTTP inspector destructors if the filters haven't invoked their respective done callbacks.

got it, thanks for the review!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those inspectors are meant to provider specific functionality, for example, tls inspect get the SNI for later the filter chain matching.

This seems problematic since the SNI extraction logic of the filter is trivially bypass-able by waiting for a timeout condition before sening the SNI. Is there potential for that to cause any security concerns?

I searched that, the Istio is using that https://github.com/istio/istio/blob/f38465f8f5c1edd944ed7776c1253ffa2d9768b2/pilot/pkg/networking/core/v1alpha3/listener_builder.go#L213

the protocolDetectionTimeout's doc probably explains the reason https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/

so it should be for a listener accepts both tls and server-first protocol connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do appreciate the desire to handle all kinds of things transparently, but adding significant delays for all connections to server-talks-first protocols seems broken.

@soulxu
Copy link
Member Author

soulxu commented Jun 14, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16952 (comment) was created by @soulxu.

see: more, trace.

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. It would be useful to know more about what is going on here in order to evaluate if ignoring the file event re-registration is the right fix.

// the same file descriptor. This is not allowed.
if (file_event_ != nullptr) {
file_event_.reset();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about file_event_ referring to a Event::FileReadyCb cb that points to a deleted listener filter. Given that, I think it is important that we find a way to reset the file event when the listen filter is destroyed, rather than hack it this way. Who owns the listener IoSocketHandleImpl?

I think that the 2 listener filters that seem to be calling initializeFileEvent are tls_inspector and http_inspector. Is the issue here that those filters need events during some parts of the connection lifetime, but eventually need to remove themselves from accepting events directly?

It would be useful to know more about when this ASSERT fails, and possibly change this from ASSERT to RELEASE_ASSERT. Do the added tests cause this ASSERT to be hit?

soulxu added 5 commits June 16, 2021 06:12
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
@@ -66,6 +66,11 @@ using ConfigSharedPtr = std::shared_ptr<Config>;
class Filter : public Network::ListenerFilter, Logger::Loggable<Logger::Id::filter> {
public:
Filter(const ConfigSharedPtr config);
~Filter() override {
if (cb_) {
cb_->socket().ioHandle().resetFileEvents();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this filter guaranteed to be the owner of the file event on the ioHandle at this point, or is it possible that some other filter owns the file event?

cb_ remains set after calls to resetFileEvents() in http_inspector.cc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could be owned by other filter. Like we enable tls inspect and http inspect at same time, tls inspect successed, then http inspect timeout, both filter will reset the file event in the destruction. But it should be fine since the reset can be execute multiple times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is wherever a filter that is not the TLS inspector nor HTTP inspector could own the file event. For example, the network::Connection owned by the http connection manager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no other filter own the file event. Since this is on the stage of accepting the connection, so the l3/l4 filter doesn't instance yet.

soulxu added 3 commits June 20, 2021 05:46
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
soulxu added 2 commits June 20, 2021 07:19
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
@soulxu soulxu requested a review from snowp as a code owner June 21, 2021 09:16
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few nits and requests to tighten test expectations.

/wait

source/server/active_tcp_listener.cc Outdated Show resolved Hide resolved
Buffer::OwnedImpl buffer("fake data");
client_->write(buffer, false);
// the timeout is set as one seconds, sleep 5 to trigger the timeout.
absl::SleepFor(absl::Seconds(5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use simulated time to avoid such a long sleep?

Also, see comments above regarding additional expectations we could add in order to ensure that this test case goes down the expected code paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me see how to simulate time in integration test. If there is way, I should be able fix the proxy filter unittest also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the proxy filter unitest to use SimulatedTimeSystemHelper. But for the integration test, I saw the BaseIntegrationTest is using the real time-system.

Event::GlobalTimeSystem time_system_;

so I thought the integration is preferred to run at real timesystem?

Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this awesome fix with comprehensive testing. You rock!

@antoniovicente antoniovicente changed the title Listener: reset the file event before initialize new one Listener: reset the file event when destroying listener filters Jun 22, 2021
@antoniovicente antoniovicente merged commit 0929a71 into envoyproxy:main Jun 22, 2021
@soulxu
Copy link
Member Author

soulxu commented Jun 23, 2021

@antoniovicente thanks for your review!

baojr added a commit to baojr/envoy that referenced this pull request Jun 23, 2021
…bridge-stream

* upstream/main: (268 commits)
  tools: adding dio,better comments (envoyproxy#17104)
  doc: fix misplaced #[extension-category] for Wasm runtimes (envoyproxy#17078)
  ci: Speedup deps precheck (envoyproxy#17102)
  doc: fix wrong link on wasm network filter. (envoyproxy#17079)
  docs: Added v3 API reference. (envoyproxy#17095)
  docs: Update include paths in repo (envoyproxy#17098)
  exception: make Ipv6Instance and Ipv4Instance not throw and remove some try catch pattern (envoyproxy#16122)
  tools: adding reminders for API shephards (envoyproxy#17081)
  ci: Fix wasm verify example (envoyproxy#17086)
  [fuzz]: fix oss fuzz bug 34515, limit maglev table size (envoyproxy#16671)
  test: silencing flaky test (envoyproxy#17084)
  Set `validate` flag when the SAN(SubjectAltName) matching is performed (envoyproxy#16816)
  Listener: reset the file event when destroying listener filters (envoyproxy#16952)
  docs: link additional filters that emit dynamic metadata (envoyproxy#17059)
  rds: add config reload time stat for rds (envoyproxy#17033)
  bazel: Use color by default for build and run commands (envoyproxy#17077)
  ci: Add timing for docker pull (envoyproxy#17074)
  [Windows] Adding note section in Original Source HTTP Filter (envoyproxy#17058)
  quic: add quic version counters in http3 codec stats. (envoyproxy#16943)
  quiche: change crypto stream factory interfaces (envoyproxy#17046)
  ...

Signed-off-by: Garrett Bourg <bourg@squareup.com>
@mattklein123
Copy link
Member

@soulxu @antoniovicente I'm late to the discussion and I see quite a bit of back and forth, but it seems sub-optimal that each listener filter has to include this new stanza to release the file event. Are we going to follow up with some RAII wrapper? Or even better is there some way in which the code that is running the filters (IIRC ActiveTcpConncetion or something like that) could keep track of dealing with this if needed? Apologize if this has already been discussed at length, but a summary might be nice. Thank you.

@soulxu
Copy link
Member Author

soulxu commented Jun 28, 2021

@soulxu @antoniovicente I'm late to the discussion and I see quite a bit of back and forth, but it seems sub-optimal that each listener filter has to include this new stanza to release the file event. Are we going to follow up with some RAII wrapper? Or even better is there some way in which the code that is running the filters (IIRC ActiveTcpConncetion or something like that) could keep track of dealing with this if needed? Apologize if this has already been discussed at length, but a summary might be nice. Thank you.

Initially, I reset the file event before initializing a new one inside ioHandle::initializeFileEvent(), then @antoniovicente has great point, that is file_event reset late after the accept filter destructed, which means file_event's callback still have reference to the accept filter at that time. So I changed back to reset file_event inside filter's destruction. But I think I did miss that we can reset the file event just before we clear the accept filters.

accept_filters_.clear();

so can invoke ioHandle().resetFileEvents() before accept_filters.clear(). and 'resetFileEvents' works fine even with the filter doesn't create any file event.

@antoniovicente correct me if I forget some points you pointed out.

@mattklein123
Copy link
Member

But I think I did miss that we can reset the file event just before we clear the accept filters.

Yeah this is how would have thought we would do it, but there is probably something I am missing about that. @antoniovicente WDYT?

@antoniovicente
Copy link
Contributor

Use of an RAII wrapper to track ownership of the file event in the network filters that need it seems like a potential improvement. The framework itself resetting the file event as it clears and re-creates the filter chain seems like a potential option also.

That said, re-creation of the filter chain on timeout seems like broken behavior to me, see #16952 (comment)

@antoniovicente
Copy link
Contributor

Also, I wonder if there's an alternate way to provide the functionality provided by this filters, like an observer callback that Network::ConnectionImpl can provide to peek at the bytes read early in the connection in order to allow these inspector filters to look at the bytes that are being processed without requiring them to peek at the socket buffer every time the client sends additional bytes.

@mattklein123
Copy link
Member

The framework itself resetting the file event as it clears and re-creates the filter chain seems like a potential option also.

This is the approach I would take personally.

Also, I wonder if there's an alternate way to provide the functionality provided by this filters, like an observer callback that Network::ConnectionImpl can provide to peek at the bytes read early in the connection in order to allow these inspector filters to look at the bytes that are being processed without requiring them to peek at the socket buffer every time the client sends additional bytes.

+1 this seems like a better option, though a much larger change to think about in the future.

@ggreenway
Copy link
Contributor

That's something I've thought about for awhile as well. For a long time, BoringSSL was running in owns-the-fd mode, but I think it doesn't anymore (uses a memory BIO), so this could now be implemented. I'm a strong +1 on no more PEEK calls in the listener filters.

@antoniovicente
Copy link
Contributor

A possible complexity is that the proxy protocol filter may do a real read after the PEEK in order to consume bytes that represent the proxy protocol header when it is present. The API that would replace the PEEKs would need to allow stop-and-buffer to support the proxy protocol case in addition to pure PEEK functionality that I think other filters depend on.

@ggreenway
Copy link
Contributor

Maybe the way to do it is have the listener filters read what they want into a Buffer::Instance, and then pass the buffer with any data they didn't consume to the next step?

@antoniovicente
Copy link
Contributor

Yes, moving the actual read from the socket to base infrastructure and passing the buffer through the various filters seems like the most likely implementation path.

@soulxu
Copy link
Member Author

soulxu commented Jul 1, 2021

The framework itself resetting the file event as it clears and re-creates the filter chain seems like a potential option also.

This is the approach I would take personally.

Let me submit another PR fix it.

Also, I wonder if there's an alternate way to provide the functionality provided by this filters, like an observer callback that Network::ConnectionImpl can provide to peek at the bytes read early in the connection in order to allow these inspector filters to look at the bytes that are being processed without requiring them to peek at the socket buffer every time the client sends additional bytes.

+1 this seems like a better option, though a much larger change to think about in the future.

Let me file an issue for this.

chrisxrepo pushed a commit to chrisxrepo/envoy that referenced this pull request Jul 8, 2021
…yproxy#16952)

The listener filter may add event on the new socket, but it won't cleanup the event.
If continue_on_listener_filters_timeout is set, the new connection may add same event
to the socket. So reset the file event before initialize new one.

Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: chris.xin <xinchuantao@qq.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
…yproxy#16952)

The listener filter may add event on the new socket, but it won't cleanup the event.
If continue_on_listener_filters_timeout is set, the new connection may add same event
to the socket. So reset the file event before initialize new one.

Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

envoy still try to create file event twice on the same fd when continue_on_listener_filters_timeout is true
6 participants