init: order dynamic resource initialization to make RTDS always be first #10362

yanavlasov · 2020-03-12T15:48:12Z

The new order of initialization:

Initialize all primary clusters
Initialize RTDS
Initialize secondary clusters
Initialize the rest of dynamic resources

Risk Level: High (changes to initialization order)
Testing: Unit Tests, Integration Tests, (internal Google e2e tests)
Docs Changes: N/A
Release Notes: N/A
Fixes #9709

Signed-off-by: Yan Avlasov yavlasov@google.com

The new order of initialization: 1. Initialize all primary clusters 2. Initialize RTDS 3. Initialize secondary clusters 4. Initialize the rest of dynamic resources Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov · 2020-03-12T15:57:43Z

This is to start the discussion about correctness of this approach and see if I have missed some edge cases.
I still need to check if any doc need to be changed.
Does this need release notes?

snowp

I think this approach looks OK but this part of the code is pretty complicated so I'd love to hear what others have to say as well.

snowp · 2020-03-12T16:14:11Z

include/envoy/upstream/cluster_manager.h

@@ -73,6 +73,15 @@ class ClusterManagerFactory;
 /**
 * Manages connection pools and load balancing for upstream clusters. The cluster manager is
 * persistent and shared among multiple ongoing requests/connections.
+ * Cluster manager is initialed in two phases. In the first phase which begins at the construction
+ * all primary (i.e. not provisioned through xDS) clusters are initialized.
+ * After the first phase the RTDS (if configured) initialization begins. This allows runtime


this sounds like its own phase, maybe we should say that there are 3 phases?

I wanted to avoid leaking overall initialization order into the cluster manager. So that is why I put 2 phases there.

In the first phase primary clusters are brought up.

The server does something else, which cluster manager does not need to care about.

Then the second phase begins where secondary clusters are initialized.

From the cluster manager perspective there are two phase only. I've updated comment and moved most of it into the InstanceImpl where the order is (mostly) established.

snowp · 2020-03-12T16:14:59Z

include/envoy/upstream/cluster_manager.h

+ * The second phase of cluster manager initialized begins after RTDS has initialized. In the second
+ * phase all secondary clusters are initialized and then the rest of the configuration provisioned
+ * through xDS.
+ * Please note: this order requires that RTDS is provisioned using a primary cluster. If RTDS is


what happens its using a secondary cluster? or is this invariant enforced?

No, it is not enforced right now. What would be the best way to do it?

There's actually multiple restrictions here:

RTDS must be available via a primary cluster.

If RTDS happens to be configured with ADS, then ADS must also be available via a primary cluster.

Various others, e.g. if a secondary cluster is configured with ADS for its EDS, then ADS must also be available via a primary cluster.

We can enforce these by throwing a config rejection exception on violation of these criteria at construction/config ingest.

Yes, the invariants for RTDS config are enforced. The ApiConfigSource must already specified using primary clusters only (checked by the Utility::checkApiConfigSourceSubscriptionBackingCluster). And RTDS provisioned through ADS will fail initialize if ADS is using secondary cluster, since secondary clusters are not present in cluster manager when RTDS is initialized.
I have added server_test tests to check this.

htuch

This looks like a well structured fix to the problem and the right approach. I have a few documentation and convention nits, otherwise implementations looks good.
/wait

include/envoy/upstream/cluster_manager.h

htuch · 2020-03-16T18:07:08Z

include/envoy/upstream/cluster_manager.h

@@ -73,6 +73,15 @@ class ClusterManagerFactory;
 /**
 * Manages connection pools and load balancing for upstream clusters. The cluster manager is
 * persistent and shared among multiple ongoing requests/connections.
+ * Cluster manager is initialed in two phases. In the first phase which begins at the construction
+ * all primary (i.e. not provisioned through xDS) clusters are initialized.
+ * After the first phase the RTDS (if configured) initialization begins. This allows runtime


htuch · 2020-03-16T18:12:21Z

include/envoy/upstream/cluster_manager.h

+ * The second phase of cluster manager initialized begins after RTDS has initialized. In the second
+ * phase all secondary clusters are initialized and then the rest of the configuration provisioned
+ * through xDS.
+ * Please note: this order requires that RTDS is provisioned using a primary cluster. If RTDS is


There's actually multiple restrictions here:

RTDS must be available via a primary cluster.

If RTDS happens to be configured with ADS, then ADS must also be available via a primary cluster.

Various others, e.g. if a secondary cluster is configured with ADS for its EDS, then ADS must also be available via a primary cluster.

We can enforce these by throwing a config rejection exception on violation of these criteria at construction/config ingest.

htuch · 2020-03-16T18:14:47Z

source/common/upstream/cluster_manager_impl.cc

@@ -178,7 +178,14 @@ void ClusterManagerInitHelper::maybeFinishInitialize() {

 void ClusterManagerInitHelper::onStaticLoadComplete() {
  ASSERT(state_ == State::Loading);
-  state_ = State::WaitingForStaticInitialize;
+  // After initialization of primary clusters has completed, transition to


Why was state_ WaitingForStaticInitialize before but now for secondary?

I've renamed states to better reflect cluster manager's initialization sequence.

htuch · 2020-03-16T18:20:40Z

source/common/upstream/cluster_manager_impl.h

+    // During this state we wait to start initializing secondary clusters. In this state all
+    // phase 1 clusters have completed initialization. Initialization of the secondary clusters
+    // is started by the `initializeSecondaryClusters` method.
+    WaitingForSecondaryInitialize,


I'd be a fan of adding Rtds as a specific state.

From my neophyte perspective this would break abstraction, i.e. why should cluster manager be concerned with RTDS and reflect it in its internal state? The way I wanted to code this is:

Initialize primary clusters.

Let the server do something else. (cluster manager is in the WaitingForSecondaryInitialize state).

Initialize secondary clusters when told so by the server.

I think the way you have it now is clean, without any mention of RTDS inside ClusterManager, resolved.

Thoughts on changing this to WaitingToStartSecondaryInitialization? I found this confusing on read through (not that what was there before was not confusing). Feel free to update others to make them more clear if that can be done. Perhaps WaitingToStartCdsInitialization, etc.?

Renamed states. Cleaned-up comments a bit as well.

source/server/server.cc

stale · 2020-03-23T18:26:12Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale · 2020-03-30T18:52:32Z

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Signed-off-by: Yan Avlasov <yavlasov@google.com>

source/server/server.cc

htuch

Looks good, a few small comments and we can ship.
/wait

test/integration/ads_integration_test.cc

stale · 2020-04-10T00:52:46Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Signed-off-by: Yan Avlasov <yavlasov@google.com>

htuch

LGTM, just one last Q.

htuch · 2020-04-13T19:00:07Z

test/integration/ads_integration_test.cc

+  }
+};
+
+INSTANTIATE_TEST_SUITE_P(IpVersionsClientTypeDelta, AdsIntegrationTestWithRtdsAndSecondaryClusters,


Where do the secondary clusters come from?

I've added a comment on line 947

Signed-off-by: Yan Avlasov <yavlasov@google.com>

htuch

LGTM, thanks!

azure-pipelines · 2020-04-14T23:34:12Z

Command 'retest' is not supported by Azure Pipelines. Supported commands help: Get descriptions, examples and documentation about supported commands Example: help "command_name" list: List all pipelines for this repository using a comment. Example: "list" run: Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run. Example: "run" or "run pipeline_name, pipeline_name, pipeline_name" where: Report back the Azure DevOps orgs that are related to this repository and org Example: "where" See additional documentation.

yanavlasov · 2020-04-14T23:34:38Z

/azp run

azure-pipelines · 2020-04-14T23:34:48Z

Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s).

mattklein123

Thanks this is great. Just a few small comments.

/wait

mattklein123 · 2020-04-15T17:49:55Z

source/common/upstream/cluster_manager_impl.h

+    // During this state we wait to start initializing secondary clusters. In this state all
+    // phase 1 clusters have completed initialization. Initialization of the secondary clusters
+    // is started by the `initializeSecondaryClusters` method.
+    WaitingForSecondaryInitialize,


Thoughts on changing this to WaitingToStartSecondaryInitialization? I found this confusing on read through (not that what was there before was not confusing). Feel free to update others to make them more clear if that can be done. Perhaps WaitingToStartCdsInitialization, etc.?

source/server/server.cc

Signed-off-by: Yan Avlasov <yavlasov@google.com>

mattklein123

Thanks! Can you merge master which should hopefully fix CI?

/wait

Signed-off-by: Yan Avlasov <yavlasov@google.com>

…rst (envoyproxy#10362) The new order of initialization: 1. Initialize all primary clusters 2. Initialize RTDS 3. Initialize secondary clusters 4. Initialize the rest of dynamic resources Signed-off-by: Yan Avlasov <yavlasov@google.com> Signed-off-by: pengg <pengg@google.com>

…ys be first (envoyproxy#10362)" This reverts commit aaba081. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>

…ys be first (#10362)" (#10919) This reverts commit aaba081. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>

Signed-off-by: Spencer Lewis <slewis@squareup.com> * master: fault injection: add support for setting gRPC status (envoyproxy#10841) tests: tag tests that fail on Windows with fails_on_windows (envoyproxy#10940) Fix typo on Postgres Proxy documentation. (envoyproxy#10930) fuzz: improve header/data stop/continue modeling in HCM fuzzer. (envoyproxy#10931) gzip filter: allow setting zlib compressor's chunk size (envoyproxy#10508) http: replace vector/reserve with InlinedVector in codec helper (envoyproxy#10941) stats: add utilities to create stats from a vector of tokens, mixing dynamic and symbolic elements. (envoyproxy#10735) hcm: avoid invoking 100-continue handling on decode filter. (envoyproxy#10929) prometheus stats: Correctly group lines of the same metric name. (envoyproxy#10833) status: Fix ASAN error in Status payload handling (envoyproxy#10906) path: Fix merge slash for paths ending with slash and present query args (envoyproxy#10922) compressor filter: add benchmark (envoyproxy#10464) xray: expected_span_name is not captured by the lambda with MSVC (envoyproxy#10934) ci: update before purge in cleanup (envoyproxy#10938) tracer: Improve test coverage for x-ray (envoyproxy#10890) Revert "init: order dynamic resource initialization to make RTDS always be first (envoyproxy#10362)" (envoyproxy#10919)

init: order dynamic resource initialization to make RTDS always be first

c03c332

The new order of initialization: 1. Initialize all primary clusters 2. Initialize RTDS 3. Initialize secondary clusters 4. Initialize the rest of dynamic resources Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov requested a review from snowp as a code owner March 12, 2020 15:48

yanavlasov assigned snowp and htuch Mar 12, 2020

snowp suggested changes Mar 12, 2020

View reviewed changes

htuch suggested changes Mar 16, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Mar 16, 2020

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 23, 2020

stale bot closed this Mar 30, 2020

Merge branch 'master' into xds-order

5a384fe

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov reopened this Apr 1, 2020

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 1, 2020

repokitteh-read-only bot removed the waiting label Apr 1, 2020

yanavlasov added 2 commits April 2, 2020 11:00

Address comments

cce82e4

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Update comments

823da61

Signed-off-by: Yan Avlasov <yavlasov@google.com>

mattklein123 self-assigned this Apr 2, 2020

htuch reviewed Apr 3, 2020

View reviewed changes

source/server/server.cc Show resolved Hide resolved

htuch suggested changes Apr 3, 2020

View reviewed changes

test/integration/ads_integration_test.cc Outdated Show resolved Hide resolved

test/integration/ads_integration_test.cc Outdated Show resolved Hide resolved

test/integration/ads_integration_test.cc Outdated Show resolved Hide resolved

repokitteh-read-only bot added the waiting label Apr 3, 2020

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 10, 2020

Merge branch 'master' into xds-order

574ddb1

Signed-off-by: Yan Avlasov <yavlasov@google.com>

repokitteh-read-only bot removed the waiting label Apr 11, 2020

Address comments

c31534f

Signed-off-by: Yan Avlasov <yavlasov@google.com>

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 11, 2020

yanavlasov added 2 commits April 10, 2020 21:14

Merge branch 'master' into xds-order

505dcaf

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Address comments

387a2ae

Signed-off-by: Yan Avlasov <yavlasov@google.com>

htuch reviewed Apr 13, 2020

View reviewed changes

Add comment

69a82ba

Signed-off-by: Yan Avlasov <yavlasov@google.com>

htuch previously approved these changes Apr 14, 2020

View reviewed changes

mattklein123 requested changes Apr 15, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Apr 15, 2020

yanavlasov added 2 commits April 16, 2020 19:39

Merge branch 'master' into xds-order

2c03a4f

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Address comments

e2be83f

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov dismissed htuch’s stale review via e2be83f April 17, 2020 00:07

repokitteh-read-only bot removed the waiting label Apr 17, 2020

Clarify comments

bac0a88

Signed-off-by: Yan Avlasov <yavlasov@google.com>

mattklein123 approved these changes Apr 18, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Apr 18, 2020

htuch approved these changes Apr 20, 2020

View reviewed changes

Merge branch 'master' into xds-order

b45787e

Signed-off-by: Yan Avlasov <yavlasov@google.com>

repokitteh-read-only bot removed the waiting label Apr 20, 2020

mattklein123 merged commit aaba081 into envoyproxy:master Apr 20, 2020

rgs1 mentioned this pull request Apr 22, 2020

Crash regression from #10362 #10901

Closed

rgs1 pushed a commit to rgs1/envoy that referenced this pull request Apr 23, 2020

Revert "init: order dynamic resource initialization to make RTDS alwa…

00a4a7b

…ys be first (envoyproxy#10362)" This reverts commit aaba081. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>

mattklein123 pushed a commit that referenced this pull request Apr 23, 2020

Revert "init: order dynamic resource initialization to make RTDS alwa…

7f165e8

…ys be first (#10362)" (#10919) This reverts commit aaba081. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>

yanavlasov deleted the xds-order branch February 1, 2021 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init: order dynamic resource initialization to make RTDS always be first #10362

init: order dynamic resource initialization to make RTDS always be first #10362

yanavlasov commented Mar 12, 2020

yanavlasov commented Mar 12, 2020

snowp left a comment

snowp Mar 12, 2020

htuch Mar 16, 2020

yanavlasov Apr 2, 2020 •

edited

Loading

snowp Mar 12, 2020

yanavlasov Mar 12, 2020

htuch Mar 16, 2020

yanavlasov Apr 2, 2020

htuch left a comment

htuch Mar 16, 2020

htuch Mar 16, 2020

htuch Mar 16, 2020

yanavlasov Apr 2, 2020 •

edited

Loading

htuch Mar 16, 2020

yanavlasov Apr 2, 2020

htuch Apr 3, 2020

mattklein123 Apr 15, 2020

yanavlasov Apr 17, 2020

stale bot commented Mar 23, 2020

stale bot commented Mar 30, 2020

htuch left a comment

stale bot commented Apr 10, 2020

htuch left a comment

htuch Apr 13, 2020

yanavlasov Apr 14, 2020

htuch left a comment

azure-pipelines bot commented Apr 14, 2020

yanavlasov commented Apr 14, 2020

azure-pipelines bot commented Apr 14, 2020

mattklein123 left a comment

mattklein123 Apr 15, 2020

mattklein123 left a comment

init: order dynamic resource initialization to make RTDS always be first #10362

init: order dynamic resource initialization to make RTDS always be first #10362

Conversation

yanavlasov commented Mar 12, 2020

yanavlasov commented Mar 12, 2020

snowp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanavlasov Apr 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanavlasov Apr 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Mar 23, 2020

stale bot commented Mar 30, 2020

htuch left a comment

Choose a reason for hiding this comment

stale bot commented Apr 10, 2020

htuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

azure-pipelines bot commented Apr 14, 2020

yanavlasov commented Apr 14, 2020

azure-pipelines bot commented Apr 14, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

yanavlasov Apr 2, 2020 •

edited

Loading

yanavlasov Apr 2, 2020 •

edited

Loading