Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for concurrent nodepool CRUD operations #13173

Conversation

modular-magician
Copy link
Collaborator

The purpose of these changes is to implement support in the beta provider for concurrent node pool CRUD operations on a single cluster.

The GA provider should be unchanged (the global mutex store changes described below are technically included in the GA provider but the GA provider behavior is unchanged - however, I am happy to make the global mutex store changes specific to the beta provider if that is preferred).

The changes to the beta provider include:

  • Updating the global mutex store to use sync.RWMutex instead of sync.Mutex and adding the necessary methods to the MutexKV struct to support acquiring shared/read locks.
  • Removing the polling for cluster "ready" status, since with support for concurrent operations on the same cluster we no longer need to wait for the cluster to have no operations running on it before proceeding.
  • For NP CRUD operations, instead of acquiring an exclusive/write lock on the cluster, we acquire a read/shared lock on the cluster and an exclusive/write lock on the node pool. This ensures cluster-wide operations (e.g. UpdateCluster) still will block NP level operations, but NP level operations on different NPs won't block each other. A NP-level mutex uses the cluster hash + node pool name to guarantee lock key uniqueness.
  • Add retry logic to NP CRUD operations to retry while it receives an "incompatible operation" error (which has the FAILED_PRECONDITION canonical code), to safely retry concurrent operations blocked by a lock conflict with another operation.

If this PR is for Terraform, I acknowledge that I have:

  • Searched through the issue tracker for an open issue that this either resolves or contributes to, commented on it to claim it, and written "fixes {url}" or "part of {url}" in this PR description. If there were no relevant open issues, I opened one and commented that I would like to work on it (not necessary for very small changes).
  • Generated Terraform, and ran make test and make lint to ensure it passes unit and linter tests.
  • Ensured that all new fields I added that can be set by a user appear in at least one example (for generated resources) or third_party test (for handwritten resources or update tests).
  • Ran relevant acceptance tests (If the acceptance tests do not yet pass or you are unable to run them, please let your reviewer know).
  • Read the Release Notes Guide before writing my release note below.

Release Note Template for Downstream PRs (will be copied)

container: Added support for concurrent node pool mutations on a cluster. Previously, node pool mutations were restricted to run synchronously clientside. NOTE: While this feature is supported in Terraform from this release onwards, only a limited number of GCP projects will support this behavior initially. The provider will automatically process mutations concurrently as the feature rolls out generally.

Reviewer Notes

  • Ran the set of 33 TestAccContainerNodePool acceptance tests with the beta provider and they passed, although TestAccContainerNodePool_withWorkloadIdentityConfig seems flaky (only passed when I ran it individually). So it seems to be fully backward compatible.
  • I did manual testing using my own *.tf files to create/delete multiple NPs concurrently, and confirmed the concurrency works.

Derived from GoogleCloudPlatform/magic-modules#6748

Signed-off-by: Modular Magician <magic-modules@google.com>
@modular-magician modular-magician merged commit 204717f into hashicorp:main Dec 5, 2022
@github-actions
Copy link

github-actions bot commented Jan 5, 2023

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 5, 2023
@modular-magician modular-magician deleted the downstream-pr-44ed75ea1ce100420c26c8a0517bc89769dcda75 branch November 17, 2024 00:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant