All the flux pods are in CrashLoopBackOff, and no logs to the pods. #4703
Unanswered
ashok-busi
asked this question in
General
Replies: 2 comments 2 replies
-
This usually happens if the CNI is not working, you need to investigate the node and see what errors kubelet and the CNI daemon reports. Another option for baremetal, is to run Flux directly on the host network, see https://github.com/stefanprodan/flux-aio |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @stefanprodan , Thanks for the response. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
All the flux pods are in CrashLoopBackOff state, and no logs to the pods on onprem kubernetes cluster.
flux version : 2.2.3
Kubernetes version : 1.28.6
NAME READY STATUS RESTARTS AGE
helm-controller-9b59fdc77-dqhkk 0/1 CrashLoopBackOff 12 (40s ago) 36m
image-automation-controller-7bd89767cc-vtg8n 0/1 CrashLoopBackOff 23 (3m25s ago) 96m
image-reflector-controller-6bbc86d5b9-5b2qp 0/1 CrashLoopBackOff 4 (18s ago) 119s
kustomize-controller-577b4ddbdd-pjm82 0/1 CrashLoopBackOff 3 (24s ago) 81s
notification-controller-797cc7d56d-5vj92 0/1 CrashLoopBackOff 23 (3m8s ago) 96m
source-controller-768b889f6d-drvw6 0/1 CrashLoopBackOff 23 (3m14s ago) 96m
couldn't find any error related to the flux.
Example for the pod events:
[root@uslxcp89634 ~]# kubectl describe pod image-automation-controller-59667c6b6c-j7k2g -n flux
Name: image-automation-controller-59667c6b6c-j7k2g
Namespace: flux
Priority: 0
Service Account: image-automation-controller
Node: uslxcp89686/10.74.21.26
Start Time: Thu, 04 Apr 2024 15:20:03 -0400
Labels: app=image-automation-controller
pod-template-hash=59667c6b6c
Annotations: kubectl.kubernetes.io/restartedAt: 2024-04-04T13:23:15-04:00
prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 192.168.121.36
IPs:
IP: 192.168.121.36
Controlled By: ReplicaSet/image-automation-controller-59667c6b6c
Containers:
manager:
Container ID: containerd://2d69d32778bfc568f998e6092716876ea660e8622a8986f9461ee61ec0a7fe41
Image: ghcr.io/fluxcd/image-automation-controller:v0.37.1
Image ID: ghcr.io/fluxcd/image-automation-controller@sha256:97a8895cab8594af7509a5f2bc5495c03b7346722afc4e1e70bf6e445b7e575d
Ports: 8080/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--events-addr=http://notification-controller.flux.svc.cluster.local./
--watch-all-namespaces=true
--log-level=debug
--log-encoding=json
--enable-leader-election
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 132
Started: Thu, 04 Apr 2024 15:25:54 -0400
Finished: Thu, 04 Apr 2024 15:25:55 -0400
Ready: False
Restart Count: 6
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
RUNTIME_NAMESPACE: flux (v1:metadata.namespace)
Mounts:
/tmp from temp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f8bkq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
temp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
kube-api-access-f8bkq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Scheduled 7m5s default-scheduler Successfully assigned flux/image-automation-controller-59667c6b6c-j7k2g to uslxcp89686
Normal Created 6m21s (x4 over 7m5s) kubelet Created container manager
Normal Started 6m21s (x4 over 7m5s) kubelet Started container manager
Warning Unhealthy 6m20s (x2 over 6m42s) kubelet Readiness probe failed: Get "http://192.168.121.36:9440/readyz": dial tcp 192.168.121.36:9440: connect: connection refused
Normal Pulled 5m40s (x5 over 7m5s) kubelet Container image "ghcr.io/fluxcd/image-automation-controller:v0.37.1" already present on machine
Warning BackOff 112s (x30 over 7m4s) kubelet Back-off restarting failed container manager in pod image-automation-controller-59667c6b6c-j7k2g_flux(87f9ef2f-cfd9-4b7e-8934-caa8c0a8cf95)
-> Steps we followed while troubleshooting but none of them worked.
Working on this form past two days. nothing has been worked and its frustrated when we couldn't find any logs to troubleshoot.
Please let us know if anyone faced similar issue. any suggestions of advices will be helpful.
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions