Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.27 csi-oci-controller pod CrashLoopBackOff #447

Open
shihchang opened this issue Dec 5, 2023 · 4 comments · Fixed by #470
Open

1.27 csi-oci-controller pod CrashLoopBackOff #447

shihchang opened this issue Dec 5, 2023 · 4 comments · Fixed by #470
Assignees

Comments

@shihchang
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

CCM Version: 1.27

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: v1.24.0
    Kustomize Version: v4.5.4
    Server Version: v1.28.3+1.el8
  • OS (e.g. from /etc/os-release): Oracle Linux Server 8.8
  • Kernel (e.g. uname -a): uname -a
    Linux k8sol8-api-server-001 5.15.0-105.125.6.2.1.el8uek.x86_64 Add support for disabling security rules to allow handling them differently #2 SMP Thu Sep 14 21:58:59 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

What happened?

csi-oci-controller pod CrashLoopBackOff, found two issues:

  • image-pull-secret:
  Warning  FailedToRetrieveImagePullSecret  4m20s (x543 over 114m)  kubelet  Unable to retrieve some image pull secrets (image-pull-secret); attempting to pull the image may not succeed.
  • snapshot-controller:
  snapshot-controller:
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
    Ready:          False
    Restart Count:  30
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nt4cl (ro)
      /var/run/shared-tmpfs from shared-tmpfs (rw)
 
Warning  BackOff  13s (x515 over 115m)  kubelet  Back-off restarting failed container snapshot-controller in pod csi-oci-controller-...

What you expected to happen?

csi-oci-controller status should be running

How to reproduce it (as minimally and precisely as possible)?

kubectl apply with https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml

Anything else we need to know?

@AkarshES
Copy link
Contributor

AkarshES commented Dec 7, 2023

@shihchang
Copy link
Member Author

Can you please try to apply the manifest without the following lines in it https://github.com/oracle/oci-cloud-controller-manager/blob/master/manifests/container-storage-interface/oci-csi-controller-driver.yaml#L121-L122

Yes, removing the image-pull-secret does cleanup that FailedToRetrieveImagePullSecret warning.
And after removing the failed snapshot-controller container, the csi-oci-controller pod did come up.

If the snapshot-controller came from the following, is it duplicate of the following (csi-snapshotter)?
https://kubernetes-csi.github.io/docs/snapshot-controller.html
https://github.com/kubernetes-csi/external-snapshotter
In the csi-oci-controller pod while the snapshot-controller container failed, the csi-snapshotter container stauts is ready. Is it OK to remove the snapshot-controller ?

@mrunalpagnis
Copy link
Member

@YashasG98 PTAL

@YashasG98
Copy link
Member

@shihchang the snapshot-controller and external-snapshotter help take backups of volumes and restore from the backups when needed

https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV.htm#contengcreatingpersistentvolumeclaim_topic-Provisioning_PVCs_on_BV-PV_From_Snapshot_CSI

They can be removed if you do not intend to use the snapshot/restore ability.

Could you please share the output for kubectl logs for the failing container to help understand why it is crashing

@mrunalpagnis mrunalpagnis linked a pull request Jul 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants