Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky CI failures due to kube-dns CM conflict #1197

Closed
bharathkkb opened this issue Apr 5, 2022 · 7 comments · Fixed by #1214
Closed

Flaky CI failures due to kube-dns CM conflict #1197

bharathkkb opened this issue Apr 5, 2022 · 7 comments · Fixed by #1214
Assignees
Labels
bug Something isn't working

Comments

@bharathkkb
Copy link
Member

TL;DR

For functionality like stubdomains, we delete and recreate the kube-dns config map. Recently we have been observing flaky errors in CI where Terraform (via kubectl-wrapper) deletes kube-dns config map but subsequent creation via kubernetes_config_map fails due to conflict.

module.example.module.gke.module.gcloud_delete_default_kube_dns_configmap.module.gcloud_kubectl.null_resource.run_command[0] (local-exec): Deleting default kube-dns configmap found in kube-system namespace
       module.example.module.gke.module.gcloud_delete_default_kube_dns_configmap.module.gcloud_kubectl.null_resource.run_command[0] (local-exec): configmap "kube-dns" deleted
       module.example.module.gke.module.gcloud_delete_default_kube_dns_configmap.module.gcloud_kubectl.null_resource.run_command[0] (local-exec): + cleanup
       module.example.module.gke.module.gcloud_delete_default_kube_dns_configmap.module.gcloud_kubectl.null_resource.run_command[0] (local-exec): + rm -rf /tmp/kubectl_wrapper_13436_30091
       module.example.module.gke.module.gcloud_delete_default_kube_dns_configmap.module.gcloud_kubectl.null_resource.run_command[0]: Creation complete after 7s [id=3816252981401156512]
       module.example.module.gke.kubernetes_config_map.kube-dns[0]: Creating...
       
 Error: configmaps "kube-dns" already exists

The cause seems to be that addon-manager is recreating that config map after delete which leads to the conflict. The CM has the label addonmanager.kubernetes.io/mode: EnsureExists. Some relevant logs filtered for namespaces/kube-system/configmaps/kube-dns

io.k8s.core.v1.configmaps.delete principal_email: "gke-int-test@ci-gke.iam.gserviceaccount.com"
io.k8s.core.v1.configmaps.create principal_email: "system:addon-manager"
io.k8s.core.v1.configmaps.create principal_email: "gke-int-test@ci-gke.iam.gserviceaccount.com"
Conflict

Expected behavior

No response

Observed behavior

No response

Terraform Configuration

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/tree/master/examples/stub_domains

Terraform Version

n/a

Additional information

No response

@bharathkkb
Copy link
Member Author

We may have to switch to modifying the CM inplace. b/192419589 for context

@bharathkkb
Copy link
Member Author

@apeabody
Copy link
Contributor

FYI: kubernetes_config_map_v1_data doesn't directly support labels. While the new kubernetes_labels resource could be used to add a label to the configMap, it wouldn't mean the same thing for the existing test.

@bharathkkb
Copy link
Member Author

@apeabody Wouldn't the existing tests pass with a label added by kubernetes_labels?

@apeabody
Copy link
Contributor

@apeabody Wouldn't the existing tests pass with a label added by kubernetes_labels?

Yes, but as it would be a separate operation from the configMap application it wouldn't be linked to if the configMap was successful or unsuccessful. However if we think there is still value in applying and checking the label in its own right, I can certainly include it. Let me know what you prefer @bharathkkb.

@bharathkkb
Copy link
Member Author

Ah I see what you meant. I think there is still value in having the label as it signifies that the resource is managed by Terraform. For the test, asserting expected contents might be better (although lets defer changing this along with updating the test framework)

@apeabody
Copy link
Contributor

Looking at using the ManagedField property instead of Label as it appears kubernetes_labels and kubernetes_config_map_v1_data conflict hashicorp/terraform-provider-kubernetes#1690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants