-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for concurrent nodepool CRUD operations #6748
Conversation
Beta support concurrent nodepool CRUD Operations
Hello! I am a robot who works on Magic Modules PRs. I've detected that you're a community contributor. @rileykarson, a repository maintainer, has been assigned to assist you and help review your changes. ❓ First time contributing? Click here for more detailsYour assigned reviewer will help review your code by:
You can help make sure that review is quick by running local tests and ensuring they're passing in between each push you make to your PR's branch. Also, try to leave a comment with each push you make, as pushes generally don't generate emails. If your reviewer doesn't get back to you within a week after your most recent change, please feel free to leave a comment on the issue asking them to take a look! In the absence of a dedicated review dashboard most maintainers manage their pending reviews through email, and those will sometimes get lost in their inbox. |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 3 files changed, 58 insertions(+), 7 deletions(-)) |
Tests analyticsTotal tests: Action takenTriggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed testsTestAccFirebaserulesRelease_BasicRelease|TestAccContainerCluster_withBinaryAuthorizationEnabledBoolLegacy|TestAccContainerCluster_withBinaryAuthorizationEvaluationModeClassic|TestAccContainerCluster_withBinaryAuthorizationEvaluationModeAutopilot|TestAccContainerCluster_withReleaseChannelEnabled|TestAccContainerCluster_withBinaryAuthorizationEnabledBool|TestAccContainerCluster_withAutoscalingProfile|TestAccContainerCluster_nodeAutoprovisioningDefaultsMinCpuPlatform|TestAccContainerCluster_withILBSubsetting|TestAccContainerCluster_withAuthenticatorGroupsConfig|TestAccContainerNodePool_withWorkloadIdentityConfig|TestAccContainerCluster_withMonitoringConfig|TestAccContainerCluster_withIdentityServiceConfig|TestAccContainerCluster_withAddons|TestAccContainerCluster_withFilteredNotificationConfig|TestAccContainerCluster_withNodePoolCIA|TestAccContainerCluster_withNodePoolResize|TestAccContainerCluster_withNodePoolAutoscaling|TestAccContainerCluster_withNodePoolUpdateVersion|TestAccContainerCluster_withShieldedNodes|TestAccContainerCluster_withResourceUsageExportConfig|TestAccContainerCluster_withCostManagementConfig|TestAccContainerCluster_withExternalIpsConfig|TestAccContainerCluster_nodeAutoprovisioning|TestAccContainerCluster_withNodeConfig|TestAccContainerCluster_deleteExclusionWindow|TestAccContainerCluster_withNodePoolDefaults|TestAccContainerCluster_updateVersion|TestAccContainerCluster_withIntraNodeVisibility|TestAccContainerCluster_updateMaintenanceExclusionOptions|TestAccContainerCluster_deleteMaintenanceExclusionOptions|TestAccContainerCluster_withMaintenanceExclusionWindow|TestAccContainerCluster_withRecurringMaintenanceWindow|TestAccContainerCluster_regionalWithNodeLocations|TestAccContainerCluster_withMaintenanceWindow|TestAccContainerCluster_withTelemetryEnabled|TestAccContainerCluster_withReleaseChannelEnabledDefaultVersion |
Tests passed during RECORDING mode: Tests failed during RECORDING mode: Please fix these to complete your PR |
The 7
|
Note: I reran the failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a pass of this, following up w/ internal doc next.
If one doesn't exist already, can you add a test that makes simultaneous updates to node pools? I'd like to have one kicking around so we can view the logs- both for this review + if we end up needing to debug in the future.
mmv1/third_party/terraform/resources/resource_container_node_pool.go.erb
Outdated
Show resolved
Hide resolved
mmv1/third_party/terraform/resources/resource_container_node_pool.go.erb
Show resolved
Hide resolved
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 4 files changed, 162 insertions(+), 7 deletions(-)) |
Tests analyticsTotal tests: Errors occurred during REPLAYING mode. Please fix them to complete your PR |
I added a test for concurrent node pool create/update/delete operations, and updated the PR to include changes in both the GA + Beta providers. When I generate the providers and build them locally, it is successful, and the test passes. However, the CI build system runs into the following error: When I do a search for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still waiting on a response from someone about fixing the tfv-head
issue, I'll kick the robot as well and see if anything changes. Minor comment about the test configs inline.
Other than our CI acting up, I think the outstanding steps are:
- Resolving whether we want the new call for the mutex key
- Those minor test comments
- Running the test in CI & seeing the retries in the logs
- Running the test locally (or grabbing logs from you- sharing internally is fine)
mmv1/third_party/terraform/tests/resource_container_node_pool_test.go.erb
Outdated
Show resolved
Hide resolved
mmv1/third_party/terraform/tests/resource_container_node_pool_test.go.erb
Outdated
Show resolved
Hide resolved
/gcbrun |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 4 files changed, 213 insertions(+), 67 deletions(-)) |
Tests analyticsTotal tests: Action takenTriggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed testsTestAccFirebaserulesRelease_BasicRelease|TestAccContainerNodePool_concurrent|TestAccContainerNodePool_withWorkloadIdentityConfig|TestAccContainerCluster_withBinaryAuthorizationEvaluationModeAutopilot|TestAccContainerCluster_withAutoscalingProfile|TestAccContainerCluster_withNodePoolResize|TestAccContainerCluster_withNodePoolUpdateVersion|TestAccContainerCluster_withReleaseChannelEnabled|TestAccContainerCluster_withEnablePrivateEndpointToggle|TestAccContainerCluster_nodeAutoprovisioningDefaultsMinCpuPlatform|TestAccContainerCluster_withNodePoolDefaults|TestAccContainerCluster_withILBSubsetting|TestAccContainerCluster_withAuthenticatorGroupsConfig|TestAccContainerCluster_withAddons|TestAccContainerCluster_withMonitoringConfig|TestAccContainerCluster_withLoggingVariantUpdates|TestAccContainerCluster_updateVersion|TestAccContainerCluster_withIntraNodeVisibility|TestAccContainerCluster_withLoggingConfig|TestAccContainerCluster_nodeAutoprovisioning|TestAccContainerCluster_withNotificationConfig|TestAccContainerCluster_regionalWithNodeLocations|TestAccContainerCluster_deleteExclusionWindow|TestAccContainerCluster_deleteMaintenanceExclusionOptions|TestAccContainerCluster_withGcpPublicCidrsAccessEnabledToggle|TestAccContainerCluster_withReleaseChannelEnabledDefaultVersion|TestAccContainerCluster_withNodePoolCIA|TestAccContainerCluster_withMasterAuthorizedNetworksConfig|TestAccContainerCluster_autoprovisioningDefaultsManagement|TestAccContainerCluster_nodeAutoprovisioningDefaultsDiskSizeGb|TestAccContainerCluster_withBinaryAuthorizationEvaluationModeClassic|TestAccLoggingBucketConfigProject_cmekSettings |
Tests passed during RECORDING mode: Tests failed during RECORDING mode: Please fix these to complete your PR |
Tests passed during RECORDING mode: Tests failed during RECORDING mode: Please fix these to complete your PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestAccContainerNodePool_concurrent
is failing, but it's because you found a bug introduced recently rather than an issue with your PR. I filed hashicorp/terraform-provider-google#13157 to get that fixed.
TestAccContainerCluster_nodeAutoprovisioningDefaultsDiskSizeGb
seems like a flake.
mmv1/third_party/terraform/tests/resource_container_node_pool_test.go.erb
Outdated
Show resolved
Hide resolved
mmv1/third_party/terraform/tests/resource_container_node_pool_test.go.erb
Outdated
Show resolved
Hide resolved
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 4 files changed, 183 insertions(+), 68 deletions(-)) |
Tests analyticsTotal tests: Action takenTriggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed testsTestAccContainerNodePool_withReservationAffinitySpecific|TestAccContainerNodePool_withReservationAffinity|TestAccContainerNodePool_withKubeletConfig|TestAccContainerNodePool_withWorkloadIdentityConfig|TestAccContainerNodePool_withSandboxConfig|TestAccContainerCluster_withBinaryAuthorizationEnabledBoolLegacy|TestAccContainerCluster_withResourceUsageExportConfig|TestAccContainerCluster_withNodePoolDefaults|TestAccContainerCluster_updateVersion|TestAccContainerCluster_withILBSubsetting|TestAccContainerCluster_withIdentityServiceConfig|TestAccContainerCluster_withAddons|TestAccContainerCluster_withNotificationConfig|TestAccContainerCluster_withIntraNodeVisibility|TestAccContainerCluster_withShieldedNodes|TestAccContainerCluster_regionalWithNodeLocations|TestAccContainerCluster_nodeAutoprovisioningDefaultsDiskSizeGb|TestAccContainerCluster_withGcpPublicCidrsAccessEnabledToggle|TestAccContainerCluster_withReleaseChannelEnabledDefaultVersion|TestAccContainerCluster_withMasterAuthorizedNetworksConfig|TestAccContainerCluster_deleteExclusionWindow|TestAccContainerCluster_deleteMaintenanceExclusionOptions|TestAccContainerCluster_updateMaintenanceExclusionOptions|TestAccContainerCluster_withRecurringMaintenanceWindow|TestAccContainerCluster_withNodePoolResize|TestAccContainerCluster_withNodePoolAutoscaling|TestAccContainerNodePool_012_ConfigModeAttr|TestAccContainerNodePool_concurrent|TestAccContainerNodePool_EmptyGuestAccelerator|TestAccContainerNodePool_version|TestAccContainerNodePool_withEnablePrivateNodesToggle|TestAccContainerNodePool_withLinuxNodeConfig|TestAccContainerNodePool_resize|TestAccContainerNodePool_withUpgradeSettings|TestAccLoggingBucketConfigProject_cmekSettings |
Tests passed during RECORDING mode: Errors occurred during RECORDING mode. Please fix them to complete your PR |
Failure here's just noise- not really sure why we triggered a rerun of all those, but we hit a process limit when RECORDING. Rerun should resolve. /gcbrun |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 4 files changed, 183 insertions(+), 68 deletions(-)) |
Tests analyticsTotal tests: Action takenTriggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed testsTestAccFirebaserulesRelease_BasicRelease|TestAccContainerNodePool_withLinuxNodeConfig|TestAccContainerNodePool_012_ConfigModeAttr|TestAccContainerNodePool_withReservationAffinitySpecific|TestAccContainerNodePool_version|TestAccContainerNodePool_withUpgradeSettings|TestAccContainerCluster_withNodePoolAutoscaling|TestAccContainerCluster_withGcpPublicCidrsAccessEnabledToggle|TestAccContainerCluster_withNodePoolDefaults|TestAccContainerCluster_withILBSubsetting|TestAccContainerCluster_updateVersion|TestAccContainerCluster_withNotificationConfig|TestAccContainerCluster_withShieldedNodes|TestAccLoggingBucketConfigProject_cmekSettings |
Tests passed during RECORDING mode: All tests passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, and I've confirmed we're seeing the intended behaviour against prod by checking the debug logs of the recording. I'd like to see the logs for a successful run w/ parallel operations enabled myself, mostly for peace-of-mind, given we're shipping code that's gonna be picked up by a future behaviour of the API. I'll follow up with you offline about that.
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 4 files changed, 179 insertions(+), 68 deletions(-)) |
Tests analyticsTotal tests: Action takenTriggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed testsTestAccLoggingBucketConfigProject_cmekSettings|TestAccFirebaserulesRelease_BasicRelease|TestAccContainerNodePool_concurrent |
Tests passed during RECORDING mode: Tests failed during RECORDING mode: Please fix these to complete your PR |
Hi there, I'm the Modular magician. I've detected the following information about your changes: Diff reportYour PR generated some diffs in downstreams - here they are. Terraform GA: Diff ( 4 files changed, 177 insertions(+), 68 deletions(-)) |
Tests analyticsTotal tests: Action takenTriggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed testsTestAccFirebaserulesRelease_BasicRelease|TestAccContainerNodePool_concurrent|TestAccLoggingBucketConfigProject_cmekSettings |
The purpose of these changes is to implement support in the beta provider for concurrent node pool CRUD operations on a single cluster.
The GA provider should be unchanged (the global mutex store changes described below are technically included in the GA provider but the GA provider behavior is unchanged - however, I am happy to make the global mutex store changes specific to the beta provider if that is preferred).
The changes to the beta provider include:
sync.RWMutex
instead ofsync.Mutex
and adding the necessary methods to theMutexKV
struct to support acquiring shared/read locks.FAILED_PRECONDITION
canonical code), to safely retry concurrent operations blocked by a lock conflict with another operation.If this PR is for Terraform, I acknowledge that I have:
make test
andmake lint
to ensure it passes unit and linter tests.Release Note Template for Downstream PRs (will be copied)
Reviewer Notes
TestAccContainerNodePool
acceptance tests with the beta provider and they passed, althoughTestAccContainerNodePool_withWorkloadIdentityConfig
seems flaky (only passed when I ran it individually). So it seems to be fully backward compatible.*.tf
files to create/delete multiple NPs concurrently, and confirmed the concurrency works.