Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller crash #2485

Closed
weizhoublue opened this issue Oct 29, 2023 · 3 comments
Closed

controller crash #2485

weizhoublue opened this issue Oct 29, 2023 · 3 comments
Assignees
Labels

Comments

@weizhoublue
Copy link
Collaborator

weizhoublue commented Oct 29, 2023

https://github.com/spidernet-io/spiderpool/actions/runs/6682604267?pr=2484

(1) error log

@Icarus9913

{"level":"INFO","ts":"2023-10-29T10:02:30.418Z","logger":"spiderpool-controller","caller":"cmd/daemon.go:534","msg":"Begin to set up MultusConfig informer"}
{"level":"INFO","ts":"2023-10-29T10:02:30.418Z","logger":"MultusConfig-Informer","caller":"multuscniconfig/multusconfig_informer.go:73","msg":"try to register MultusConfig informer"}
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 1404 [running]:
	>  runtime/debug.Stack()
	>  	/usr/local/go/src/runtime/debug/stack.go:24 +0x65
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/src/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:60 +0xcd
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc00042e580, {0x2384601, 0x9})
	>  	/src/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x4c
	>  github.com/go-logr/logr.Logger.WithName(...)
	>  	/src/vendor/github.com/go-logr/logr/logr.go:336
	>  sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).getLogger.func1()
	>  	/src/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:182 +0x63
	>  sync.(*Once).doSlow(0x0?, 0xc00020f360?)
	>  	/usr/local/go/src/sync/once.go:74 +0xc2
	>  sync.(*Once).Do(...)
	>  	/usr/local/go/src/sync/once.go:65
	>  sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).getLogger(0xc00020e000?, 0xc0007f4000?)
	>  	/src/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:180 +0x53
	>  sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc00020e000, {0x7f17f12028b8?, 0xc000338fa0}, 0xc0005b4300)
	>  	/src/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:96 +0xc34
	>  github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7f17f12028b8, 0xc000338fa0}, 0x2710b00?)
	>  	/src/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60 +0xd4
	>  net/http.HandlerFunc.ServeHTTP(0x2710bc0?, {0x7f17f12028b8?, 0xc000338fa0?}, 0xc000fc5828?)
	>  	/usr/local/go/src/net/http/server.go:2122 +0x2f
	>  github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x2710bc0?, 0xc00045d500?}, 0xc0005b4300)
	>  	/src/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:147 +0xc5
	>  net/http.HandlerFunc.ServeHTTP(0x77ab45?, {0x2710bc0?, 0xc00045d500?}, 0x40dc28?)
	>  	/usr/local/go/src/net/http/server.go:2122 +0x2f
	>  github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x2710bc0, 0xc00045d500}, 0xc0005b4300)
	>  	/src/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:109 +0xc7
	>  net/http.HandlerFunc.ServeHTTP(0xc00045d500?, {0x2710bc0?, 0xc00045d500?}, 0x237d059?)
	>  	/usr/local/go/src/net/http/server.go:2122 +0x2f
	>  net/http.(*ServeMux).ServeHTTP(0xc000452bcc?, {0x2710bc0, 0xc00045d500}, 0xc0005b4300)
	>  	/usr/local/go/src/net/http/server.go:2500 +0x149
	>  net/http.serverHandler.ServeHTTP({0x26ffee0?}, {0x2710bc0, 0xc00045d500}, 0xc0005b4300)
	>  	/usr/local/go/src/net/http/server.go:2936 +0x316
	>  net/http.(*conn).serve(0xc000457710, {0x27120d8, 0xc0003dcc90})
	>  	/usr/local/go/src/net/http/server.go:1995 +0x612
	>  created by net/http.(*Server).Serve
	>  	/usr/local/go/src/net/http/server.go:3089 +0x5ed
{"level":"DEBUG","ts":"2023-10-29T10:11:19.381Z","logger":"IPPool-Webhook.Validating","caller":"ippoolmanager/ippool_webhook.go:77","msg":"Request IPPool: {TypeMeta:{Kind:SpiderIPPool APIVersion:spiderpool.spidernet.io/v2beta1} ObjectMeta:{Name:v6pool-4ead0606c2c2a17 GenerateName: Namespace: SelfLink: UID:1ceb5411-0ec7-4c32-a519-140f9b32aec4 ResourceVersion: Generation:1 CreationTimestamp:2023-10-29 10:11:19 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[ipam.spidernet.io/ippool-cidr:fd00-69d5---120] Annotations:map[] OwnerReferences:[] Finalizers:[spiderpool.spidernet.io] ManagedFields:[{Manager:macvlan-overlay-one.test Operation:Update 

(2) the init-pod will kill agent-pod when running with overlay-cni, so, after setup spiderpool
the multus.sh failed to call kubectl wait --for=condition=ready -l app.kubernetes.io/name=spiderpool because killing is happenning

@cyclinder

pod/spiderpool-controller-5fdb8bbcf9-j87pf condition met
[16851](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16852)
pod/spiderpool-controller-5fdb8bbcf9-lrzs2 condition met
[16852](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16853)
pod/spiderpool-init condition met
[16853](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16854)
pod/spiderpool-rdma-shared-device-plugin-dmbbq condition met
[16854](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16855)
pod/spiderpool-rdma-shared-device-plugin-nv5d7 condition met
[16855](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16856)
pod/spiderpool-sriov-operator-5b6f778c84-5j4zb condition met
[16856](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16857)
timed out waiting for the condition on pods/spiderpool-agent-gjf78
[16857](https://github.com/spidernet-io/spiderpool/actions/runs/6682604267/job/18158029276?pr=2484#step:11:16858)
Error from server (NotFound): pods "spiderpool-agent-vpnhs" not found

so all E2E process and document should wait for the pod-init completed before next steps

I even have other ideas to make helm install --wait works well :

plan1 : introduce pod readyness, and make pod ready after all init setups finish

plan2: move all process to init-pod

(3)

@ty-dc

lots of invalid log in the output of debugEnv

--------------- execute ip a in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "a2f349bb3e606a83ae8776239f50bb4f3feff841821fc7fdade95f18cfd40fea": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip link show in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "2b97b3a81f8fe5e52c2ba8b6359766a9e0fa2bae0217b6362121e82a978a04fb": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip n in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "8014b4530946b7d7fb0db48910442fb5605534426d8ac83a21f62f8a2a756167": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip -6 n in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "ca57ad1e0df768e3d2ffb18de63b00233f11f19dc180277c06c9aa38f239afad": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip rule in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "066509573f3234cd8999193a755c1d9ee58a9b72d84c26df4dc82f45b5ac61d2": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip -6 rule in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "03098949678d2e0930f4e65e362bf2bdae94d71793797f765d48e93f4c125572": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip route in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "ba7e87bf815087850ec24858a82f97ca0611f34eec159bef86ff58759c920245": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown
--------------- execute ip -6 route in pod: kruise-system kruise-controller-manager-5d97dcd65c-cv589 ------------
Unable to use a TTY - input is not a terminal or the right kind of file
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "8f3141e7343b909b04af4b538a004a454b8b385079227b5ae31fb7c724d96d53": OCI runtime exec failed: exec failed: unable to start container process: exec: "ip": executable file not found in $PATH: unknown

miss the container name

--------------- execute ip a in pod: kubevirt virt-handler-7jb8s ------------
Defaulted container "virt-handler" out of: virt-handler, virt-launcher (init)
Unable to use a TTY - input is not a terminal or the right kind of file
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:01:3c:f7:59:d4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fd00:10:244::133/128 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::5c01:3cff:fef7:59d4/64 scope link 
       valid_lft forever preferred_lft forever
@weizhoublue weizhoublue changed the title controller crash owing to multus controller crash Oct 29, 2023
@Icarus9913
Copy link
Collaborator

I'll solve the controller-runtime log problem ASAP.
Reference issue: #2450

@Icarus9913
Copy link
Collaborator

  1. controller-runtime log problem solved: set logger for controller-runtime framework #2490

@ty-dc
Copy link
Collaborator

ty-dc commented Nov 1, 2023

For the third point, finish this month.

@Icarus9913 Icarus9913 removed their assignment Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants