Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize stopCh channel for ExternalDNS #5175

Merged
merged 5 commits into from
Feb 28, 2024
Merged

Conversation

shaun-nx
Copy link
Contributor

Proposed changes

Addresses #5104

When exiting the Ingress Controller Process with enableExternalDns=true, a panic is observed when attempting to stop each namespaced informer.

This change ensures each namespaced informer for the ExternalDNS has its stopCh channel initialized

Testing

Scale down the Ingress Controller deployment to 0 pods with controller.enableExternalDns=true and controller.logLevel=3

Behaviour before change:

I0227 15:30:37.287962 1 main.go:590] Received SIGTERM, shutting down  
I0227 15:30:37.288530 1 main.go:226] Waiting for the controller to exit...  
I0227 15:30:37.289092 1 controller.go:145] shutting down queue as workqueue signaled shutdown  
I0227 15:30:37.290650 1 manager.go:334] Quitting nginx  
I0227 15:30:37.290756 1 utils.go:17] executing /usr/sbin/nginx -s quit  
I0227 15:30:37.324376 1 leader.go:101] stopped leading  
E0227 15:30:37.326534      1 runtime.go:79] Observed a panic: "close of nil channel" (close of nil channel)
goroutine 1942 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1dd8a60?, 0x256a970})
	k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x90?})
	k8s.io/apimachinery@v0.29.2/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x1dd8a60?, 0x256a970?})
	runtime/panic.go:914 +0x21f
github.com/nginxinc/kubernetes-ingress/internal/externaldns.(*namespacedInformer).stop(...)
	github.com/nginxinc/kubernetes-ingress/internal/externaldns/controller.go:149
github.com/nginxinc/kubernetes-ingress/internal/externaldns.(*ExtDNSController).Run(0xc000788300, 0x0?)
	github.com/nginxinc/kubernetes-ingress/internal/externaldns/controller.go:139 +0x45e
created by github.com/nginxinc/kubernetes-ingress/internal/k8s.(*LoadBalancerController).Run in goroutine 1
	github.com/nginxinc/kubernetes-ingress/internal/k8s/controller.go:707 +0x489
panic: close of nil channel [recovered]
	panic: close of nil channel

Behaviour after change:

I0227 16:25:09.287962 1 main.go:590] Received SIGTERM, shutting down  
I0227 16:25:09.288530 1 main.go:226] Waiting for the controller to exit...  
I0227 16:25:09.289092 1 controller.go:145] shutting down queue as workqueue signaled shutdown  
I0227 16:25:09.290650 1 manager.go:334] Quitting nginx  
I0227 16:25:09.290756 1 utils.go:17] executing /usr/sbin/nginx -s quit  
I0227 16:25:09.324376 1 leader.go:101] stopped leading  
2024/02/27 16:25:09 [notice] 30#30: signal 3 (SIGQUIT) received from 77, shutting down  
2024/02/27 16:25:09 [notice] 66#66: gracefully shutting down  
2024/02/27 16:25:09 [notice] 60#60: gracefully shutting down  
2024/02/27 16:25:09 [notice] 60#60: exiting  
2024/02/27 16:25:09 [notice] 65#65: gracefully shutting down  
2024/02/27 16:25:09 [notice] 60#60: exit  
2024/02/27 16:25:09 [notice] 61#61: gracefully shutting down  
2024/02/27 16:25:09 [notice] 63#63: gracefully shutting down  
2024/02/27 16:25:09 [notice] 62#62: gracefully shutting down  
2024/02/27 16:25:09 [notice] 67#67: gracefully shutting down  
2024/02/27 16:25:09 [notice] 61#61: exiting  
2024/02/27 16:25:09 [notice] 66#66: exiting  
2024/02/27 16:25:09 [notice] 65#65: exiting  
2024/02/27 16:25:09 [notice] 68#68: gracefully shutting down  
2024/02/27 16:25:09 [notice] 67#67: exiting  
2024/02/27 16:25:09 [notice] 69#69: gracefully shutting down  
2024/02/27 16:25:09 [notice] 68#68: exiting  
2024/02/27 16:25:09 [notice] 64#64: gracefully shutting down  
2024/02/27 16:25:09 [notice] 68#68: exit  
2024/02/27 16:25:09 [notice] 67#67: exit  
2024/02/27 16:25:09 [notice] 64#64: exiting  
2024/02/27 16:25:09 [notice] 64#64: exit  
2024/02/27 16:25:09 [notice] 63#63: exiting  
2024/02/27 16:25:09 [notice] 61#61: exit  
2024/02/27 16:25:09 [notice] 63#63: exit  
2024/02/27 16:25:09 [notice] 62#62: exiting  
2024/02/27 16:25:09 [notice] 65#65: exit  
2024/02/27 16:25:09 [notice] 62#62: exit  
2024/02/27 16:25:09 [notice] 66#66: exit  
2024/02/27 16:25:09 [notice] 69#69: exiting  
2024/02/27 16:25:09 [notice] 69#69: exit  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 60  
2024/02/27 16:25:09 [notice] 30#30: worker process 60 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: signal 29 (SIGIO) received  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 68  
2024/02/27 16:25:09 [notice] 30#30: worker process 68 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: signal 29 (SIGIO) received  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 64  
2024/02/27 16:25:09 [notice] 30#30: worker process 64 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: worker process 65 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 65  
2024/02/27 16:25:09 [notice] 30#30: signal 29 (SIGIO) received  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 67  
2024/02/27 16:25:09 [notice] 30#30: worker process 67 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: worker process 61 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 61  
2024/02/27 16:25:09 [notice] 30#30: signal 29 (SIGIO) received  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 69  
2024/02/27 16:25:09 [notice] 30#30: worker process 69 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: signal 29 (SIGIO) received  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 63  
2024/02/27 16:25:09 [notice] 30#30: worker process 62 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: worker process 63 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: signal 29 (SIGIO) received  
2024/02/27 16:25:09 [notice] 30#30: signal 17 (SIGCHLD) received from 66  
2024/02/27 16:25:09 [notice] 30#30: worker process 66 exited with code 0  
2024/02/27 16:25:09 [notice] 30#30: exit  
I0227 16:25:09.533346 1 main.go:604] Exiting successfully

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@shaun-nx shaun-nx requested a review from a team as a code owner February 27, 2024 16:41
@github-actions github-actions bot added the bug An issue reporting a potential bug label Feb 27, 2024
@shaun-nx shaun-nx linked an issue Feb 27, 2024 that may be closed by this pull request
@j1m-ryan
Copy link
Member

Nice one Shaun!

Logs on main after killing a nic pod with enableExternalDns

I0227 16:56:30.652241       1 main.go:590] Received SIGTERM, shutting down
I0227 16:56:30.656052       1 manager.go:334] Quitting nginx
I0227 16:56:30.656166       1 utils.go:17] executing /usr/sbin/nginx -s quit
I0227 16:56:30.661731       1 main.go:226] Waiting for the controller to exit...
I0227 16:56:30.742108       1 controller.go:135] shutting down queue as workqueue signaled shutdown
I0227 16:56:30.751744       1 leader.go:101] stopped leading
panic: close of nil channel

Logs on bug/panic-external-dns after killing a nic pod with enableExternalDns

I0227 17:01:15.707004       1 main.go:590] Received SIGTERM, shutting down
I0227 17:01:15.707903       1 manager.go:334] Quitting nginx
I0227 17:01:15.707971       1 utils.go:17] executing /usr/sbin/nginx -s quit
I0227 17:01:15.708646       1 main.go:226] Waiting for the controller to exit...
I0227 17:01:15.790988       1 controller.go:135] shutting down queue as workqueue signaled shutdown
I0227 17:01:15.806516       1 leader.go:101] stopped leading
2024/02/27 17:01:16 [notice] 30#30: signal 3 (SIGQUIT) received from 77, shutting down
2024/02/27 17:01:16 [notice] 62#62: gracefully shutting down
2024/02/27 17:01:16 [notice] 64#64: gracefully shutting down
2024/02/27 17:01:16 [notice] 61#61: gracefully shutting down
2024/02/27 17:01:16 [notice] 67#67: gracefully shutting down
2024/02/27 17:01:16 [notice] 62#62: exiting
2024/02/27 17:01:16 [notice] 64#64: exiting
2024/02/27 17:01:16 [notice] 61#61: exiting
2024/02/27 17:01:16 [notice] 67#67: exiting
2024/02/27 17:01:16 [notice] 66#66: gracefully shutting down
2024/02/27 17:01:16 [notice] 63#63: gracefully shutting down
2024/02/27 17:01:16 [notice] 67#67: exit
2024/02/27 17:01:16 [notice] 64#64: exit
2024/02/27 17:01:16 [notice] 66#66: exiting
2024/02/27 17:01:16 [notice] 63#63: exiting
2024/02/27 17:01:16 [notice] 66#66: exit
2024/02/27 17:01:16 [notice] 62#62: exit
2024/02/27 17:01:16 [notice] 63#63: exit
2024/02/27 17:01:16 [notice] 68#68: gracefully shutting down
2024/02/27 17:01:16 [notice] 61#61: exit
2024/02/27 17:01:16 [notice] 69#69: gracefully shutting down
2024/02/27 17:01:16 [notice] 69#69: exiting
2024/02/27 17:01:16 [notice] 65#65: gracefully shutting down
2024/02/27 17:01:16 [notice] 69#69: exit
2024/02/27 17:01:16 [notice] 65#65: exiting
2024/02/27 17:01:16 [notice] 65#65: exit
2024/02/27 17:01:16 [notice] 68#68: exiting
2024/02/27 17:01:16 [notice] 68#68: exit
2024/02/27 17:01:16 [notice] 70#70: gracefully shutting down
2024/02/27 17:01:16 [notice] 70#70: exiting
2024/02/27 17:01:16 [notice] 70#70: exit
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 64
2024/02/27 17:01:16 [notice] 30#30: worker process 64 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 61
2024/02/27 17:01:16 [notice] 30#30: worker process 61 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 66
2024/02/27 17:01:16 [notice] 30#30: worker process 66 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 63
2024/02/27 17:01:16 [notice] 30#30: worker process 63 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 62
2024/02/27 17:01:16 [notice] 30#30: worker process 62 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 65
2024/02/27 17:01:16 [notice] 30#30: worker process 65 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 67
2024/02/27 17:01:16 [notice] 30#30: worker process 67 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 70
2024/02/27 17:01:16 [notice] 30#30: worker process 70 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 69
2024/02/27 17:01:16 [notice] 30#30: worker process 69 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: signal 29 (SIGIO) received
2024/02/27 17:01:16 [notice] 30#30: signal 17 (SIGCHLD) received from 68
2024/02/27 17:01:16 [notice] 30#30: worker process 68 exited with code 0
2024/02/27 17:01:16 [notice] 30#30: exit
I0227 17:01:16.231695       1 main.go:604] Exiting successfully

@shaun-nx shaun-nx merged commit acaadfc into main Feb 28, 2024
81 checks passed
@shaun-nx shaun-nx deleted the bug/panic-external-dns branch February 28, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue reporting a potential bug
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

panic: close of nil channel during shutdown
5 participants