-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Listener: reset the file event when destroying listener filters #16952
Conversation
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
// the same file descriptor. This is not allowed. | ||
if (file_event_ != nullptr) { | ||
file_event_.reset(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another fix will be to reset the event in the destructor of the listener filter, but that would require every listener filter to take care of it. That is why I change here.
Also, this fix doesn't prevent initiate two instances of FileEventImpl directly, it is how the bug istio/istio#18229 was trigger initially, but I searched the code, we don't have that case anymore. Since we create FileEventImpl only through the dispatcher. So maybe we can move the constructor of FileEventImpl as protected, and only dispatcher can instantiate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about file_event_ referring to a Event::FileReadyCb cb that points to a deleted listener filter. Given that, I think it is important that we find a way to reset the file event when the listen filter is destroyed, rather than hack it this way. Who owns the listener IoSocketHandleImpl?
I think that the 2 listener filters that seem to be calling initializeFileEvent are tls_inspector and http_inspector. Is the issue here that those filters need events during some parts of the connection lifetime, but eventually need to remove themselves from accepting events directly?
It would be useful to know more about when this ASSERT fails, and possibly change this from ASSERT to RELEASE_ASSERT. Do the added tests cause this ASSERT to be hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option that could be considered for cases like this one is to add a RAII object that can be used for filters like this one to hold the file_event registration in cases where the filter does own a the file event registration.
Also, I think that falling back to the default filter chain in cases where tls inspector times out may not be the right thing. If the client is not sending a handshake within a reasonable time, the proxy should just close the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this happens only on continue_on_listener_filters_timeout_, would resetting the file_event in ActiveTcpSocket::onTimeout() solve the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because ActiveTcpSocket doesn't know who owns the relevant connection socket. The filters need to properly manage the file events they own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the code of TLS inspector and I think I understand the issue better. I have concerns about the TLS inspector filter effectively removing itself from the filter chain on timeout. This behavior allows for trivial bypass of the TLS inspector and HTTP inspector functionality by waiting for a few seconds before sending the rest of the client hello. Are these inspectors purely debug hooks or are they meant to provide specific functionality or security properties?
Thanks for confirming that your new tests repo the ASSERT failure. I suggest we try to fix this issue by calling resetFileEvent in TLS inspector and HTTP inspector destructors if the filters haven't invoked their respective done callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the code of TLS inspector and I think I understand the issue better. I have concerns about the TLS inspector filter effectively removing itself from the filter chain on timeout. This behavior allows for trivial bypass of the TLS inspector and HTTP inspector functionality by waiting for a few seconds before sending the rest of the client hello. Are these inspectors purely debug hooks or are they meant to provide specific functionality or security properties?
Those inspectors are meant to provider specific functionality, for example, tls inspect get the SNI for later the filter chain matching.
Thanks for confirming that your new tests repo the ASSERT failure. I suggest we try to fix this issue by calling resetFileEvent in TLS inspector and HTTP inspector destructors if the filters haven't invoked their respective done callbacks.
got it, thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the code of TLS inspector and I think I understand the issue better. I have concerns about the TLS inspector filter effectively removing itself from the filter chain on timeout. This behavior allows for trivial bypass of the TLS inspector and HTTP inspector functionality by waiting for a few seconds before sending the rest of the client hello. Are these inspectors purely debug hooks or are they meant to provide specific functionality or security properties?
Those inspectors are meant to provider specific functionality, for example, tls inspect get the SNI for later the filter chain matching.
This seems problematic since the SNI extraction logic of the filter is trivially bypass-able by waiting for a timeout condition before sening the SNI. Is there potential for that to cause any security concerns?
Thanks for confirming that your new tests repo the ASSERT failure. I suggest we try to fix this issue by calling resetFileEvent in TLS inspector and HTTP inspector destructors if the filters haven't invoked their respective done callbacks.
got it, thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those inspectors are meant to provider specific functionality, for example, tls inspect get the SNI for later the filter chain matching.
This seems problematic since the SNI extraction logic of the filter is trivially bypass-able by waiting for a timeout condition before sening the SNI. Is there potential for that to cause any security concerns?
I searched that, the Istio is using that https://github.com/istio/istio/blob/f38465f8f5c1edd944ed7776c1253ffa2d9768b2/pilot/pkg/networking/core/v1alpha3/listener_builder.go#L213
the protocolDetectionTimeout
's doc probably explains the reason https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/
so it should be for a listener accepts both tls and server-first protocol connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do appreciate the desire to handle all kinds of things transparently, but adding significant delays for all connections to server-talks-first protocols seems broken.
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. It would be useful to know more about what is going on here in order to evaluate if ignoring the file event re-registration is the right fix.
// the same file descriptor. This is not allowed. | ||
if (file_event_ != nullptr) { | ||
file_event_.reset(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about file_event_ referring to a Event::FileReadyCb cb that points to a deleted listener filter. Given that, I think it is important that we find a way to reset the file event when the listen filter is destroyed, rather than hack it this way. Who owns the listener IoSocketHandleImpl?
I think that the 2 listener filters that seem to be calling initializeFileEvent are tls_inspector and http_inspector. Is the issue here that those filters need events during some parts of the connection lifetime, but eventually need to remove themselves from accepting events directly?
It would be useful to know more about when this ASSERT fails, and possibly change this from ASSERT to RELEASE_ASSERT. Do the added tests cause this ASSERT to be hit?
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
test/extensions/filters/listener/http_inspector/http_inspector_test.cc
Outdated
Show resolved
Hide resolved
@@ -66,6 +66,11 @@ using ConfigSharedPtr = std::shared_ptr<Config>; | |||
class Filter : public Network::ListenerFilter, Logger::Loggable<Logger::Id::filter> { | |||
public: | |||
Filter(const ConfigSharedPtr config); | |||
~Filter() override { | |||
if (cb_) { | |||
cb_->socket().ioHandle().resetFileEvents(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It this filter guaranteed to be the owner of the file event on the ioHandle at this point, or is it possible that some other filter owns the file event?
cb_ remains set after calls to resetFileEvents() in http_inspector.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could be owned by other filter. Like we enable tls inspect and http inspect at same time, tls inspect successed, then http inspect timeout, both filter will reset the file event in the destruction. But it should be fine since the reset can be execute multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is wherever a filter that is not the TLS inspector nor HTTP inspector could own the file event. For example, the network::Connection owned by the http connection manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no other filter own the file event. Since this is on the stage of accepting the connection, so the l3/l4 filter doesn't instance yet.
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few nits and requests to tighten test expectations.
/wait
source/extensions/filters/listener/proxy_protocol/proxy_protocol.h
Outdated
Show resolved
Hide resolved
source/extensions/filters/listener/tls_inspector/tls_inspector.h
Outdated
Show resolved
Hide resolved
test/extensions/filters/listener/proxy_protocol/proxy_protocol_test.cc
Outdated
Show resolved
Hide resolved
Buffer::OwnedImpl buffer("fake data"); | ||
client_->write(buffer, false); | ||
// the timeout is set as one seconds, sleep 5 to trigger the timeout. | ||
absl::SleepFor(absl::Seconds(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use simulated time to avoid such a long sleep?
Also, see comments above regarding additional expectations we could add in order to ensure that this test case goes down the expected code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me see how to simulate time in integration test. If there is way, I should be able fix the proxy filter unittest also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the proxy filter unitest to use SimulatedTimeSystemHelper
. But for the integration test, I saw the BaseIntegrationTest is using the real time-system.
Event::GlobalTimeSystem time_system_; |
so I thought the integration is preferred to run at real timesystem?
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for this awesome fix with comprehensive testing. You rock!
@antoniovicente thanks for your review! |
…bridge-stream * upstream/main: (268 commits) tools: adding dio,better comments (envoyproxy#17104) doc: fix misplaced #[extension-category] for Wasm runtimes (envoyproxy#17078) ci: Speedup deps precheck (envoyproxy#17102) doc: fix wrong link on wasm network filter. (envoyproxy#17079) docs: Added v3 API reference. (envoyproxy#17095) docs: Update include paths in repo (envoyproxy#17098) exception: make Ipv6Instance and Ipv4Instance not throw and remove some try catch pattern (envoyproxy#16122) tools: adding reminders for API shephards (envoyproxy#17081) ci: Fix wasm verify example (envoyproxy#17086) [fuzz]: fix oss fuzz bug 34515, limit maglev table size (envoyproxy#16671) test: silencing flaky test (envoyproxy#17084) Set `validate` flag when the SAN(SubjectAltName) matching is performed (envoyproxy#16816) Listener: reset the file event when destroying listener filters (envoyproxy#16952) docs: link additional filters that emit dynamic metadata (envoyproxy#17059) rds: add config reload time stat for rds (envoyproxy#17033) bazel: Use color by default for build and run commands (envoyproxy#17077) ci: Add timing for docker pull (envoyproxy#17074) [Windows] Adding note section in Original Source HTTP Filter (envoyproxy#17058) quic: add quic version counters in http3 codec stats. (envoyproxy#16943) quiche: change crypto stream factory interfaces (envoyproxy#17046) ... Signed-off-by: Garrett Bourg <bourg@squareup.com>
@soulxu @antoniovicente I'm late to the discussion and I see quite a bit of back and forth, but it seems sub-optimal that each listener filter has to include this new stanza to release the file event. Are we going to follow up with some RAII wrapper? Or even better is there some way in which the code that is running the filters (IIRC ActiveTcpConncetion or something like that) could keep track of dealing with this if needed? Apologize if this has already been discussed at length, but a summary might be nice. Thank you. |
Initially, I reset the file event before initializing a new one inside ioHandle::initializeFileEvent(), then @antoniovicente has great point, that is file_event reset late after the accept filter destructed, which means file_event's callback still have reference to the accept filter at that time. So I changed back to reset file_event inside filter's destruction. But I think I did miss that we can reset the file event just before we clear the accept filters. envoy/source/server/active_tcp_listener.cc Line 204 in 5a8f89a
so can invoke ioHandle().resetFileEvents() before accept_filters.clear(). and 'resetFileEvents' works fine even with the filter doesn't create any file event. @antoniovicente correct me if I forget some points you pointed out. |
Yeah this is how would have thought we would do it, but there is probably something I am missing about that. @antoniovicente WDYT? |
Use of an RAII wrapper to track ownership of the file event in the network filters that need it seems like a potential improvement. The framework itself resetting the file event as it clears and re-creates the filter chain seems like a potential option also. That said, re-creation of the filter chain on timeout seems like broken behavior to me, see #16952 (comment) |
Also, I wonder if there's an alternate way to provide the functionality provided by this filters, like an observer callback that Network::ConnectionImpl can provide to peek at the bytes read early in the connection in order to allow these inspector filters to look at the bytes that are being processed without requiring them to peek at the socket buffer every time the client sends additional bytes. |
This is the approach I would take personally.
+1 this seems like a better option, though a much larger change to think about in the future. |
That's something I've thought about for awhile as well. For a long time, BoringSSL was running in owns-the-fd mode, but I think it doesn't anymore (uses a memory BIO), so this could now be implemented. I'm a strong +1 on no more PEEK calls in the listener filters. |
A possible complexity is that the proxy protocol filter may do a real read after the PEEK in order to consume bytes that represent the proxy protocol header when it is present. The API that would replace the PEEKs would need to allow stop-and-buffer to support the proxy protocol case in addition to pure PEEK functionality that I think other filters depend on. |
Maybe the way to do it is have the listener filters read what they want into a |
Yes, moving the actual read from the socket to base infrastructure and passing the buffer through the various filters seems like the most likely implementation path. |
Let me submit another PR fix it.
Let me file an issue for this. |
…yproxy#16952) The listener filter may add event on the new socket, but it won't cleanup the event. If continue_on_listener_filters_timeout is set, the new connection may add same event to the socket. So reset the file event before initialize new one. Signed-off-by: He Jie Xu <hejie.xu@intel.com> Signed-off-by: chris.xin <xinchuantao@qq.com>
…yproxy#16952) The listener filter may add event on the new socket, but it won't cleanup the event. If continue_on_listener_filters_timeout is set, the new connection may add same event to the socket. So reset the file event before initialize new one. Signed-off-by: He Jie Xu <hejie.xu@intel.com>
Commit Message: Listener: reset the file event before initializing new one
Additional Description:
The listener filter may add event on the new socket, but it won't cleanup the event.
If
continue_on_listener_filters_timeout
is set, the new connection may add same eventto the socket. So reset the file event before initialize new one.
Risk Level: low
Testing: integration test added
Docs Changes: n/a
Release Notes: n/a
Fixes #16951
Signed-off-by: He Jie Xu hejie.xu@intel.com