Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: xDS mux unification #15473

Closed
wants to merge 10 commits into from
Closed

WIP: xDS mux unification #15473

wants to merge 10 commits into from

Conversation

dmitri-d
Copy link
Contributor

Fixes #11477, reintroduces changes in #8974. and supersedes work in #14496.

  • Unified mux implementation (located in source/common/config/unified_mux dir) is completely in parallel to the original muxes, and leaves the latter virtually untouched
  • Unified muxes are disabled by default. To enable. set envoy.reloadable_features.unified_mux runtime flag set to true.
  • Tests that required minimal changes in order to test unified muxes (these were integration-level tests, fuzz- and performance-tests) were modified to include unified muxes.
  • Tests specific to unified muxes have unified prepended to their names (test/common/config/unified_grpc_mux_impl_test.cc is one of those)

TODOs:

  • document the runtime flag
  • More delta mux-specific unit tests are needed

Signed-off-by: Dmitri Dolguikh ddolguik@redhat.com

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

ping @htuch, @asraa.

I know you asked to break up the work into smaller chunks, but the only bits I could break out are the factory and interface changes, which are pretty trivial changes. The bulk of work is unified mux implementation + tests (new and updated). All unified mux code is now in source/common/config/unified_mux dir, and is completely independent of existing mux implementation. All tests specific to unified mux work are in test/common/config and have unified_ prefix. Additional notes on implementation:

  • source/common/config/subscription_factory_impl.cc and source/common/upstream/cluster_manager_impl.cc are the places where unifed muxes are intantiated (based on the value of envoy.reloadable_features.unified_mux).
  • I had to touch current implementation of muxes, but only to add stubs for new interfaces, which aren't being used anywhere.

@dmitri-d
Copy link
Contributor Author

dmitri-d commented Mar 13, 2021

Can you give some details on the control plane facing changes that enabling/disabling this flag causes?

I'll try documenting the changes next week. in the meantime test/integration/ads_integration_test.cc should illustrate the main differences.

ping @howardjohn

@dmitri-d dmitri-d marked this pull request as draft March 13, 2021 01:20
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Copy link
Contributor

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like the main changes:

  • if we send CDS v1, then CDS v2 while v1 is still warming, we don't get an ACK for both - just for v2 once its done warming
  • If we send CDS while CDS is paused, we will get 2 duplicate ACKs
    ?

Dmitri Dolguikh added 6 commits March 15, 2021 11:10
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@htuch
Copy link
Member

htuch commented Mar 16, 2021

@dmitri-d @adisuissa is going to take a first pass.

Dmitri Dolguikh added 2 commits March 16, 2021 16:37
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
@dmitri-d
Copy link
Contributor Author

dmitri-d commented Apr 8, 2021

ping @adisuissa, any chance you'll have time to take a look at this in the near future?

@yanavlasov yanavlasov self-assigned this Apr 13, 2021
@dmitri-d
Copy link
Contributor Author

Ping @adisuissa: any chance you could spend some time on this? I'm concern that changes to current muxes need to be ported here, which in turn makes the review harder...

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
I’ve done a few iterations over the code, trying to understand what can be done to make sure we are not missing some functionality, and not breaking current functionality.
The runtime feature is a great step to make sure we don’t break anything w/o the ability to go back to things that work.
It's still challenging to see what was added and why, because of the amount of code that was changed, and I'm trying to compare the original to the unified and to the other variant (delta to SotW).

Regarding how to make this PR more digestible, by making a series of small PRs. If we can have incremental changes, each with additional functionality, it would make it easier to understand what was changed and why. An alternative is to add things in the following order: SubscriptionState, GrpcMux, and then GrpcSubscription (these can be split into SotW and delta sub-PRs). If you’ve got other ideas on how to break this down even further this would be really helpful.

Tests: I suggest making the delta and the unified-delta use the same tests, and the sotw and unified-sotw use the same tests (as much as possible). The idea is to make sure any change in one component doesn’t negatively impact the other, and that all tests are written in a single place.

A couple of general things:

  • the use of void* in some place where it should be Delta/Discovery{Request/Response}. It might be possible to avoid them by using templates in the base class, and specializing it in the derived class.
  • Some of the code is added to allow the unified variant to work well, and will be removed when the legacy code is removed.This should be documented (add TODOs).

* Passes through to all multiplexed SubscriptionStates. To be called when something
* definitive happens with the initial fetch: either an update is successfully received,
* or some sort of error happened.*/
virtual void disableInitFetchTimeoutTimer() PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this and the above methods are not implemented in the non-unified versions. If that's the case, would it be possible to create an interface between GrpcMux and the unified versions that adds these methods?

// understanding of the current protocol state, and new resources that Envoy wants to request.
// Returns a new'd pointer, meant to be owned by the caller, who is expected to know what type the
// pointer actually is.
virtual void* getNextRequestAckless() PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the use of void* can be avoided by using templates in the base class, and specialising it in the derived class.


void GrpcMuxSotw::establishGrpcStream() { grpc_stream_.establishNewStream(); }

void GrpcMuxSotw::sendGrpcMessage(void* msg_proto_ptr, SubscriptionState& sub_state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the cast, this function and its GrpcMuxDelta counterpart look similar. Can this be refactored so the same logic will be in the base class?

return subscriptions_;
}

// legacy mux interface not implemented by unified mux.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding comments to the places that will eventually be removed once the non-unified code is removed.

envoy::config::core::v3::ApiVersion::V3),
callbacks_, std::chrono::milliseconds(0U), dispatcher_) {
state_.updateSubscriptionInterest({"name1", "name2", "name3"}, {});
auto cur_request = getNextDiscoveryRequestAckless();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const here and in simialr places

TEST_F(DeltaSubscriptionStateTest, SubscribeAndUnsubscribe) {
{
state_.updateSubscriptionInterest({"name4"}, {"name1"});
auto cur_request = getNextDeltaDiscoveryRequestAckless();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const here and in similar places.

watch_map =
watch_maps_.emplace(type_url, std::make_unique<WatchMap>(use_namespace_matching)).first;
subscriptions_.emplace(type_url, subscription_state_factory_->makeSubscriptionState(
type_url, *watch_maps_[type_url], init_fetch_timeout));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of calling *watch_maps_[type_url] you should be able to save the result of adding to watch_maps_ above, and take the second element.

return std::make_unique<Cleanup>([this, type_urls]() {
for (const auto& type_url : type_urls) {
pausable_ack_queue_.resume(type_url);
trySendDiscoveryRequests();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, previously this issued a request just for the resumed type_url, but now it will send discovery requests for all type_url that are available.
I think querying & sending to a specific type_url makes sense, but the alternative should probably be to call trySendDiscoveryRequests() after the for loop.

type_url);
return;
}
pausable_ack_queue_.push(sub->second->handleResponse(response_proto_ptr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned UpdateAck should probably be moved (std::move) to the queue.

std::unique_ptr<SubscriptionStateFactory> subscription_state_factory_;

// Map key is type_url.
// Only addWatch() should insert into these maps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to have a single map between the type_url to both the WatchMapPtr and SubscriptionStatePtr (so they will always be in sync)?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 26, 2021
@github-actions
Copy link

github-actions bot commented Jun 2, 2021

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Jun 2, 2021
@htuch htuch reopened this Jun 9, 2021
@htuch
Copy link
Member

htuch commented Jun 9, 2021

Sorry, I've been OOO, will take a look in the next couple of days.

@dmitri-d
Copy link
Contributor Author

dmitri-d commented Jun 9, 2021

@htuch: I started breaking up this PR into smaller ones, #16486 is the one that needs reviews.

@htuch htuch closed this Jun 9, 2021
htuch pushed a commit that referenced this pull request Jul 9, 2021
Split out subscription state out of xDS mux unification PR (#15473). Made base subscription state class a template. Updated existing delta state tests to work with both legacy and new implementations.

Risk Level: low, the code is not being used atm
Testing: updated existing tests, added new ones

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Split out subscription state out of xDS mux unification PR (envoyproxy#15473). Made base subscription state class a template. Updated existing delta state tests to work with both legacy and new implementations.

Risk Level: low, the code is not being used atm
Testing: updated existing tests, added new ones

Signed-off-by: Dmitri Dolguikh <ddolguik@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Converge [New]GrpcMuxImpl
5 participants