-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init: order dynamic resource initialization to make RTDS always be first #10362
Conversation
The new order of initialization: 1. Initialize all primary clusters 2. Initialize RTDS 3. Initialize secondary clusters 4. Initialize the rest of dynamic resources Signed-off-by: Yan Avlasov <yavlasov@google.com>
This is to start the discussion about correctness of this approach and see if I have missed some edge cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach looks OK but this part of the code is pretty complicated so I'd love to hear what others have to say as well.
@@ -73,6 +73,15 @@ class ClusterManagerFactory; | |||
/** | |||
* Manages connection pools and load balancing for upstream clusters. The cluster manager is | |||
* persistent and shared among multiple ongoing requests/connections. | |||
* Cluster manager is initialed in two phases. In the first phase which begins at the construction | |||
* all primary (i.e. not provisioned through xDS) clusters are initialized. | |||
* After the first phase the RTDS (if configured) initialization begins. This allows runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like its own phase, maybe we should say that there are 3 phases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid leaking overall initialization order into the cluster manager. So that is why I put 2 phases there.
- In the first phase primary clusters are brought up.
- The server does something else, which cluster manager does not need to care about.
- Then the second phase begins where secondary clusters are initialized.
From the cluster manager perspective there are two phase only. I've updated comment and moved most of it into the InstanceImpl where the order is (mostly) established.
* The second phase of cluster manager initialized begins after RTDS has initialized. In the second | ||
* phase all secondary clusters are initialized and then the rest of the configuration provisioned | ||
* through xDS. | ||
* Please note: this order requires that RTDS is provisioned using a primary cluster. If RTDS is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens its using a secondary cluster? or is this invariant enforced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not enforced right now. What would be the best way to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually multiple restrictions here:
- RTDS must be available via a primary cluster.
- If RTDS happens to be configured with ADS, then ADS must also be available via a primary cluster.
- Various others, e.g. if a secondary cluster is configured with ADS for its EDS, then ADS must also be available via a primary cluster.
We can enforce these by throwing a config rejection exception on violation of these criteria at construction/config ingest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the invariants for RTDS config are enforced. The ApiConfigSource must already specified using primary clusters only (checked by the Utility::checkApiConfigSourceSubscriptionBackingCluster). And RTDS provisioned through ADS will fail initialize if ADS is using secondary cluster, since secondary clusters are not present in cluster manager when RTDS is initialized.
I have added server_test tests to check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a well structured fix to the problem and the right approach. I have a few documentation and convention nits, otherwise implementations looks good.
/wait
@@ -73,6 +73,15 @@ class ClusterManagerFactory; | |||
/** | |||
* Manages connection pools and load balancing for upstream clusters. The cluster manager is | |||
* persistent and shared among multiple ongoing requests/connections. | |||
* Cluster manager is initialed in two phases. In the first phase which begins at the construction | |||
* all primary (i.e. not provisioned through xDS) clusters are initialized. | |||
* After the first phase the RTDS (if configured) initialization begins. This allows runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* The second phase of cluster manager initialized begins after RTDS has initialized. In the second | ||
* phase all secondary clusters are initialized and then the rest of the configuration provisioned | ||
* through xDS. | ||
* Please note: this order requires that RTDS is provisioned using a primary cluster. If RTDS is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's actually multiple restrictions here:
- RTDS must be available via a primary cluster.
- If RTDS happens to be configured with ADS, then ADS must also be available via a primary cluster.
- Various others, e.g. if a secondary cluster is configured with ADS for its EDS, then ADS must also be available via a primary cluster.
We can enforce these by throwing a config rejection exception on violation of these criteria at construction/config ingest.
@@ -178,7 +178,14 @@ void ClusterManagerInitHelper::maybeFinishInitialize() { | |||
|
|||
void ClusterManagerInitHelper::onStaticLoadComplete() { | |||
ASSERT(state_ == State::Loading); | |||
state_ = State::WaitingForStaticInitialize; | |||
// After initialization of primary clusters has completed, transition to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was state_
WaitingForStaticInitialize
before but now for secondary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed states to better reflect cluster manager's initialization sequence.
// During this state we wait to start initializing secondary clusters. In this state all | ||
// phase 1 clusters have completed initialization. Initialization of the secondary clusters | ||
// is started by the `initializeSecondaryClusters` method. | ||
WaitingForSecondaryInitialize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be a fan of adding Rtds as a specific state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my neophyte perspective this would break abstraction, i.e. why should cluster manager be concerned with RTDS and reflect it in its internal state? The way I wanted to code this is:
- Initialize primary clusters.
- Let the server do something else. (cluster manager is in the
WaitingForSecondaryInitialize
state). - Initialize secondary clusters when told so by the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way you have it now is clean, without any mention of RTDS inside ClusterManager, resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on changing this to WaitingToStartSecondaryInitialization
? I found this confusing on read through (not that what was there before was not confusing). Feel free to update others to make them more clear if that can be done. Perhaps WaitingToStartCdsInitialization
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed states. Cleaned-up comments a bit as well.
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few small comments and we can ship.
/wait
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one last Q.
} | ||
}; | ||
|
||
INSTANTIATE_TEST_SUITE_P(IpVersionsClientTypeDelta, AdsIntegrationTestWithRtdsAndSecondaryClusters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do the secondary clusters come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment on line 947
Signed-off-by: Yan Avlasov <yavlasov@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Command 'retest' is not supported by Azure Pipelines. Supported commands
See additional documentation. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is great. Just a few small comments.
/wait
// During this state we wait to start initializing secondary clusters. In this state all | ||
// phase 1 clusters have completed initialization. Initialization of the secondary clusters | ||
// is started by the `initializeSecondaryClusters` method. | ||
WaitingForSecondaryInitialize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on changing this to WaitingToStartSecondaryInitialization
? I found this confusing on read through (not that what was there before was not confusing). Feel free to update others to make them more clear if that can be done. Perhaps WaitingToStartCdsInitialization
, etc.?
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Can you merge master which should hopefully fix CI?
/wait
Signed-off-by: Yan Avlasov <yavlasov@google.com>
…rst (envoyproxy#10362) The new order of initialization: 1. Initialize all primary clusters 2. Initialize RTDS 3. Initialize secondary clusters 4. Initialize the rest of dynamic resources Signed-off-by: Yan Avlasov <yavlasov@google.com> Signed-off-by: pengg <pengg@google.com>
…ys be first (envoyproxy#10362)" This reverts commit aaba081. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: Spencer Lewis <slewis@squareup.com> * master: fault injection: add support for setting gRPC status (envoyproxy#10841) tests: tag tests that fail on Windows with fails_on_windows (envoyproxy#10940) Fix typo on Postgres Proxy documentation. (envoyproxy#10930) fuzz: improve header/data stop/continue modeling in HCM fuzzer. (envoyproxy#10931) gzip filter: allow setting zlib compressor's chunk size (envoyproxy#10508) http: replace vector/reserve with InlinedVector in codec helper (envoyproxy#10941) stats: add utilities to create stats from a vector of tokens, mixing dynamic and symbolic elements. (envoyproxy#10735) hcm: avoid invoking 100-continue handling on decode filter. (envoyproxy#10929) prometheus stats: Correctly group lines of the same metric name. (envoyproxy#10833) status: Fix ASAN error in Status payload handling (envoyproxy#10906) path: Fix merge slash for paths ending with slash and present query args (envoyproxy#10922) compressor filter: add benchmark (envoyproxy#10464) xray: expected_span_name is not captured by the lambda with MSVC (envoyproxy#10934) ci: update before purge in cleanup (envoyproxy#10938) tracer: Improve test coverage for x-ray (envoyproxy#10890) Revert "init: order dynamic resource initialization to make RTDS always be first (envoyproxy#10362)" (envoyproxy#10919)
The new order of initialization:
Risk Level: High (changes to initialization order)
Testing: Unit Tests, Integration Tests, (internal Google e2e tests)
Docs Changes: N/A
Release Notes: N/A
Fixes #9709
Signed-off-by: Yan Avlasov yavlasov@google.com