-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: xDS mux unification #14496
WIP: xDS mux unification #14496
Conversation
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
…unification Signed-off-by: Fred Douglas <fredlas@google.com>
This reverts commit 30ce222. This restoration was to test my theory that 8350 was the cause of the hang. Now that we have established it wasn't, it will be good to have this cleaner, more sensible behavior back in. Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
…unification Signed-off-by: Fred Douglas <fredlas@google.com>
…ing behavior, which treats any news about the fetch as precluding a timeout, as opposed to only wanting a successful one Signed-off-by: Fred Douglas <fredlas@google.com>
…unification Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
…unification Signed-off-by: Fred Douglas <fredlas@google.com>
…unification Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
…unification Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
…ementation that is wire-compatible with the legacy grpc mux used in non-delta xDS Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just API review, can you check CI and then I will look into code.
@@ -51,7 +51,7 @@ message ApiConfigSource { | |||
// the v2 protos is used. | |||
REST = 1; | |||
|
|||
// gRPC v2 API. | |||
// "State of the world" gRPC v2 API, using Discovery{Request,Response} protos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove v2
, as we're moving to v3. or actually revert this.
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dmitri-d, this is epic. A couple of high-level questions:
- Did you end up root causing the original rollback? Do we know why things would be different this time around?
- Is there any way to break this PR up into smaller PRs or recommend a review strategy? This is a lot of churn and code which appears new (but maybe comes from elsewhere?). The GitHub UX makes it hard to track this and reason about correctness.
source/common/config/README.md
Outdated
[Delta]DiscoveryRequest protos. | ||
|
||
What does GrpcMux even do in this diagram? Just own things and pass through function calls? | ||
Answer: 1) it sequences the requests and ACKs that the various type_urls send, 2) it handles the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use proper markdown enumerated lists?
I did not; my understanding is that @rgs1 and @fredlas couldn't find anything in the original work. @rgs offered help testing changes (here: #11477 (comment)), hopefully it's still on. I ported legacy (non-delta xDS) mux and subscription implementations into this branch, which should make the transition easier. The legacy mux is enabled by default.
The core of the changes is the original PR; however it had to be updated to accommodate things like subascription resumption (ScopedResume), ttl, etc. Some of the issues are due to naming: diff on I'd suggest looking at changes in A next good step, IMO, would be to look at delta xDS mux and subscription implementations and their use (here: source/common/config/delta_subscription_state.cc envoy/source/common/config/grpc_mux_impl.h Line 144 in e8033f7
I'd leave SotW mux, subscription, and subscription state for last. I think TTL logic can be reviewed separately too, after the main functionality has been dealt with. Hope this helps, let me know if there's anything else I can do to help with reviewing of this PR. |
@adisuissa will help with a first pass, thanks! |
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@adisuissa: please let me know if there's anything I can do to make this review easier... |
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for taking this on @dmitri-d. Following up on IM conversations, I can provide some high-level guidance here and @adisuissa will dive deeper. I think though we might want to consider splitting some of this into distinct PRs.
Some high-level suggestions to be able to land this somewhat incrementally and with better reviewability:
- An initial PR that just introduces the new runtime option in the subscription factory and cluster manager. It should switch between new/old variants but keep the same name for existing SotW and for the new
GrpxMuxSotw
just have some dummy implementation. - Don't change names until a final PR for anything that doesn't need it. E.g. can we keep
GrpxMuxImpl
instead ofGrpcMuxSotw
for now? Can we keepNewGrpcMuxImpl
instead ofGrpcMuxDelta
? These newer Sotw/Delta names should ideally be the names in the end state, but hopefully that is a trivial rename pass once we have the functional stuff sorted.
One thing that concerns me is that the amount of change in something like GrpcMuxImpl
that isn't protected by the runtime flags is very high.
If we had as a starting point the guarantee that zero code changes in existing grpc_mux_impl.cc
and new_grpc_mux_impl.cc
, that these are the runtime default from the factories, and then we build up the replacements over a series of PRs, I think this would have higher safety guarantee and be easier to review.
The downside of this is that we will end up with three (or maybe four?) different gRPC muxes effectively during the transition, which is pretty horrendous but might be safest and the most incremental approach. The reason I'm making this case is that even something like grp_mux_impl.cc
, which should be the legacy path, has significant delta in this PR.
This is just a suggestion, curious to hear from you and @adisuissa on this topic.
/** | ||
* Adapter from typed Subscription to untyped GrpcMux. Also handles per-xDS API stats/logging. | ||
*/ | ||
class LegacyGrpcSubscriptionImpl : public Subscription, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we need a legacy subscription? The subscription is largely independent of the gRPC mux itself, it would seem not a lot of logic should change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely due to init_fetch_timeout
timer moving to watches. I think having a dedicated implementation is easier/safer/more maintainable than having conditionals in one common implementation.
I agree that preserving the names of the classes would make reviewing much easier, will check how much work it is. |
+1 for incremental changes, and for creating a new unified mux without renaming the old ones. |
@@ -160,7 +164,7 @@ TEST_P(CdsIntegrationTest, CdsClusterUpDownUp) { | |||
ASSERT_TRUE(codec_client_->waitForDisconnect()); | |||
|
|||
// Tell Envoy that cluster_1 is back. | |||
EXPECT_TRUE(compareDiscoveryRequest(Config::TypeUrl::get().Cluster, "42", {}, {}, {})); | |||
EXPECT_TRUE(compareDiscoveryRequest(Config::TypeUrl::get().Cluster, "42", {}, {}, {}, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many of the compareDiscoveryRequest
calls the PR adds the 6th arg as false (which IIUC should be false by default). Would it be better to remove it from these calls? (less changes across all integration tests)
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
@@ -74,6 +74,7 @@ constexpr const char* runtime_features[] = { | |||
"envoy.reloadable_features.http_transport_failure_reason_in_body", | |||
"envoy.reloadable_features.http_upstream_wait_connect_response", | |||
"envoy.reloadable_features.http2_skip_encoding_empty_trailers", | |||
"envoy.reloadable_features.legacy_sotw_xds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give some details on the control plane facing changes that enabling/disabling this flag causes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closed in favour of #15473
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Fixes #11477.
TODO's