Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to mount secrets when a new node is scaled up #759

Closed
2 tasks done
sjdweb opened this issue Jan 18, 2022 · 14 comments
Closed
2 tasks done

Failing to mount secrets when a new node is scaled up #759

sjdweb opened this issue Jan 18, 2022 · 14 comments
Labels
bug Something isn't working stale

Comments

@sjdweb
Copy link

sjdweb commented Jan 18, 2022

Have you

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

Every time we deploy this Helm chart, we hit this problem.

Because there are 200 pods in this over a number of deployments, there will be a node scale-up triggered on AKS.

Once the pod is assigned the new node/s, we see that the pod is unable to mount the secret volume.

Errors:

On pod:

driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers
message: 'Unable to attach or mount volumes: unmounted olumes=[secrets-store-inline], unattached volumes=[secrets-store-inline my-app-7srx2]: timed out waiting for the condition'

In MIC:

2022-01-18T21:18:11.340568206Z stderr F E0118 21:18:11.340460       1 server.go:145] GRPC error: failed to mount secrets store objects for pod my-ns/my-app-5b85748794-cjt52, err: rpc error: code = Canceled desc = context canceled

2022-01-18T21:18:11.338694492Z stderr F I0118 21:18:11.338460       1 nodeserver.go:72] "unmounting target path as node publish volume failed" targetPath="/var/lib/kubelet/pods/70c6d07b-370e-4e0b-bc81-d251e030c3ae/volumes/kubernetes.io~csi/secrets-store-inline/mount" pod="my-ns/my-app-5b85748794-cjt52"

What did you expect to happen:

The pod should run as expected on the new node.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

If I babysit the deployment and kill the pods after I know the nodes are healthy, the newly created pods are fine.

The problem here is the pods that fail to mount (because the CSI driver is not found yet, or times out) get stuck in a ContainerCreating state.

Which access mode did you use to access the Azure Key Vault instance:
[e.g. Service Principal, Pod Identity, User Assigned Managed Identity, System Assigned Managed Identity]

Pod Identity

Environment:

  • Secrets Store CSI Driver version: (use the image tag):
mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar:v2.3.0
mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver:v0.3.0
mcr.microsoft.com/oss/kubernetes-csi/livenessprobe:v2.4.0
mcr.microsoft.com/oss/azure/secrets-store/provider-azure:v0.2.0
  • Azure Key Vault provider version: (use the image tag):
mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.8.6
mcr.microsoft.com/oss/azure/aad-pod-identity/mic:v1.8.6
  • Kubernetes version: (use kubectl version and kubectl get nodes -o wide):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:52:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"a5e4de7e277a707bd28d448bd75de58b4f1cdc22", GitTreeState:"clean", BuildDate:"2021-11-16T01:09:55Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
Nodes:

v1.20.9
Ubuntu 18.04.6 LTS
containerd://1.4.9+azure
  • Cluster type: (e.g. AKS, aks-engine, etc):

Azure AKS

@sjdweb sjdweb added the bug Something isn't working label Jan 18, 2022
@sjdweb
Copy link
Author

sjdweb commented Jan 18, 2022

Here's some more detail as I watch another deployment:

│ Events:                                                                                                                                                                                                              │
│   Type     Reason            Age    From                Message                                                                                                                                                      │
│   ----     ------            ----   ----                -------                                                                                                                                                      │
│   Warning  FailedScheduling  2m52s  default-scheduler   0/8 nodes are available: 8 Insufficient cpu.                                                                                                                 │
│   Warning  FailedScheduling  2m52s  default-scheduler   0/8 nodes are available: 8 Insufficient cpu.                                                                                                                 │
│   Normal   Scheduled         2m33s  default-scheduler   Successfully assigned my-rest/my-rest-api-54df7d596b-b6qpq to aks-default-38331632-vmss00000l                                                              │
│   Normal   TriggeredScaleUp  2m33s  cluster-autoscaler  pod triggered scale-up: [{aks-default-38331632-vmss 9->10 (max: 50)}]                                                                                        │
│   Warning  FailedMount       34s    kubelet             MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = DeadlineExceeded desc = context deadline exceeded                             │
│   Warning  FailedMount       31s    kubelet             Unable to attach or mount volumes: unmounted volumes=[secrets-store-inline], unattached volumes=[my-rest-token-7srx2 secrets-store-inline]: timed out waiti │
│ ng for the condition
│ Events:                                                                                                                                                                                                              │
│   Type     Reason            Age                From                Message                                                                                                                                          │
│   ----     ------            ----               ----                -------                                                                                                                                          │
│   Warning  FailedScheduling  4m16s              default-scheduler   0/8 nodes are available: 8 Insufficient cpu.                                                                                                     │
│   Warning  FailedScheduling  4m16s              default-scheduler   0/8 nodes are available: 8 Insufficient cpu.                                                                                                     │
│   Normal   Scheduled         83s                default-scheduler   Successfully assigned my-rest/my-rest-job-workers-processed-consumer-worker-5c459c86c-8rtt8 to aks-default-38331632-vmss00000r                 │
│   Normal   TriggeredScaleUp  4m3s               cluster-autoscaler  pod triggered scale-up: [{aks-default-38331632-vmss 8->9 (max: 50)}]                                                                             │
│   Warning  FailedMount       15s (x8 over 84s)  kubelet             MountVolume.SetUp failed for volume "secrets-store-inline" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-st │
│ ore.csi.k8s.io not found in the list of registered CSI drivers                                                                                                                                                       │
│

I see that some pods are in ContainerCreating (over 120s!)

kube-system   csi-secrets-azure-csi-secrets-store-provider-azure-kgjtc   ●   0/1            0 ContainerCreating
│ Events:                                                                                                                                                                                                              │
│   Type    Reason     Age    From               Message                                                                                                                                                               │
│   ----    ------     ----   ----               -------                                                                                                                                                               │
│   Normal  Scheduled  2m25s  default-scheduler  Successfully assigned kube-system/secrets-store-csi-driver-jkl6n to aks-default-38331632-vmss00000s                                                                   │
│   Normal  Pulled     2m22s  kubelet            Container image "mcr.microsoft.com/oss/kubernetes-csi/csi-node-driver-registrar:v2.3.0" already present on machine                                                    │
│   Normal  Created    2m21s  kubelet            Created container node-driver-registrar                                                                                                                               │
│   Normal  Started    2m21s  kubelet            Started container node-driver-registrar                                                                                                                               │
│   Normal  Pulling    2m21s  kubelet            Pulling image "mcr.microsoft.com/oss/kubernetes-csi/secrets-store/driver:v0.3.0"
```

@aramase
Copy link
Member

aramase commented Jan 18, 2022

Thanks for reporting the issue!

Unfortunately, this behavior isn't specific to the secrets-store-csi-driver implementation but rather how workloads are scheduled in Kubernetes. The CSI driver needs to be running on the node for the volume mount request to be processed but in case of a scale up event there is no way to ensure all system pods (csi driver, kube-proxy, other pods in kube-system namespace) are running before the workload pods start running.

There was an enhancement proposal centered around this: kubernetes/enhancements#1003 but was closed. Once the driver and providers pods are running on the new node, the pods waiting volume mount will eventually start Running because kubelet will keep retrying to mount the volume.

@aramase
Copy link
Member

aramase commented Jan 18, 2022

Typically the images for some of these components are baked in the VHD image. If they're not present in the VHD, then the image needs to be pulled which is probably what you're seeing in the describe output.

@sjdweb
Copy link
Author

sjdweb commented Jan 19, 2022

@aramase thank you for the quick response! I assume there's no known workaround for this?

The behaviour we've observed over ~10 rollouts is that the pods never hit Running without manual intervention (delete pod, new one then starts). We have a timeout of 25 minutes for the helm upgrade.

@nilekhc
Copy link
Contributor

nilekhc commented Jan 19, 2022

@sjdweb, Do we know if driver and provider pods were running at the time of manual intervention?

@sjdweb
Copy link
Author

sjdweb commented Jan 19, 2022

@nilekhc yes all other pods were running relating to the secret provider, identity etc before manual intervention

@aramase
Copy link
Member

aramase commented Jan 19, 2022

@aramase thank you for the quick response! I assume there's no known workaround for this?

Currently there is no workaround for this. Eventually the pod volume mounts will succeed when the drivers are running because of the retries available in kubelet. Long term I think it would be great if node readiness as mentioned in the KEP above is a thing so we can mark the node ready only after the csi drivers and other system critical pods are running.

@sjdweb
Copy link
Author

sjdweb commented Jan 19, 2022

@aramase unfortunately the pod volume mounts never succeed in our case. Today for example the containers were stuck for 35m in ContainerCreating without intervention. Killing the pods fixed the issue, but we cannot do this every deployment.

@aramase
Copy link
Member

aramase commented Jan 19, 2022

Yeah, I agree that's not a great experience.

  1. Could you use the v1.0.0 version for the driver and provider?
  2. Share the kubectl events during the timeframe.
  3. If you have access to kubelet logs on the node, that'll be great so we can see how often kubelet is retrying the mount.
  4. Output for kubectl get pods to show when the driver, provider pods started running.
  5. Logs from the driver and provider pods on the new node.

@github-actions
Copy link

github-actions bot commented Feb 3, 2022

This issue is stale because it has been open 14 days with no activity. Please comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Feb 3, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 21 days with no activity. Feel free to re-open if you are experiencing the issue again.

@vinhnguyen500
Copy link

@sjdweb we're running into this issue as well where the pod volume mounts never succeed. Did you manage to find a workaround?

It'd be great if the kubelet would actually retry and eventually succeed but kicking the pods manually sucks especially when scale-ups are happening automatically with our cluster-autoscaler

@pdefreitas
Copy link

pdefreitas commented Jun 27, 2022

We're experiencing the same issue and the only workaround we found is manual intervention by deleting the pod. When a new pod is scheduled it is able to successfully mount the secrets. Does anyone have a workaround to avoid the necessity of manual intervention? This is important for workloads that are very dynamic (autoscaling).
Edit for reference:
We're using v1.0.1 and aad-pod-policy v1.8.5.

@pdefreitas
Copy link

I've managed to fix this situation. In our case is because the default Helm chart values do not match the PriorityClass used if you install the components through Azure Portal Addons installer.

Examples:

@aramase shouldn't the documentation pinpoint if you install through Helm chart you should setup the PriorityClass accordingly? Out-of-the-box is a bit hard to track down the missing PriorityClass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

5 participants