Disable PDB for managed addon #1934

damvinod · 2024-02-16T11:16:57Z

/triage support
Hi, is there a way to delete/change PDB for the add-on via the config ? I don't see the config in the json schema.

k8s-ci-robot · 2024-02-16T11:17:00Z

@damvinod: The label(s) triage/support cannot be applied, because the repository doesn't have them.

In response to this:

/triage support
Hi, is there a way to delete/change PDB for the add-on via the config ? I don't see the config in the json schema.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AndrewSirenko · 2024-02-16T16:03:31Z

Thanks for opening an issue @damvinod.

Today, we don't offer a way to disable/change the ebs-csi-controller deployment's pod disruption budget in either our Helm chart or EKS-managed addon config. I'll treat this issue as a feature request to add that functionality.

Could you provide some details about your use case for disabling the PDB, as under normal circumstances doing so would be inadvisable?

/kind feature

damvinod · 2024-02-17T00:17:36Z

Currently we have 3 nodes in an EKS cluster with 2 replicas of ebs-csi-controller. We have a use-case to rollover 2 nodes in parallel instead of default 1 so that the rollover can be done quickly.

Since the default PDB of ebs-csi-controller MAX_UNAVAILABLE is 1 we can't roll 2 nodes in parallel. So wanted to check if the PDB can be customized/deleted.

AndrewSirenko · 2024-02-19T22:22:50Z

Hi @damvinod, thank you for your use-case.

We can expose a configuration option to disable the creation of the controller's PDB, so that users like you can deploy a custom PDB on their own after deploying the driver.

Note that deploying the driver without a PDB with MAX_UNAVAILABLE set to 1 is highly discouraged because a cluster without a running ebs-csi-controller pod means that no EBS Volumes can be provisioned/attached/detached/deleted, which will delay stateful workload starts/terminations.

Note that it may be possible to rollover 2 nodes in your cluster with the PDB, because as long as there is an ebs-csi-controller on that third node, the PDB won't block your node rollovers. You can make sure that is the case by cordoning + draining the nodes before deleting them so that any ebs-csi-controller pods can get rescheduled, having 3 replicas, or evicting the ebs-csi-controller pod manually from nodes you will rollover.

Here is an example of me following the no-volume-lifecycle-outage version of this process, purposefully rolling over the two nodes that have the ebs-csi-controller pods scheduled.

❯ kubectl get nodes
NAME                                            STATUS   ROLES    AGE     VERSION
ip-192-168-105-23.us-west-2.compute.internal    Ready    <none>   6m46s   v1.27.9-eks-5e0fdde
ip-192-168-152-206.us-west-2.compute.internal   Ready    <none>   6m42s   v1.27.9-eks-5e0fdde
ip-192-168-160-227.us-west-2.compute.internal   Ready    <none>   6m41s   v1.27.9-eks-5e0fdde

❯ kubectl cordon ip-192-168-152-206.us-west-2.compute.internal
node/ip-192-168-152-206.us-west-2.compute.internal cordoned
❯ kubectl cordon ip-192-168-160-227.us-west-2.compute.internal
node/ip-192-168-160-227.us-west-2.compute.internal cordoned

❯ kubectl drain --delete-emptydir-data --ignore-daemonsets ip-192-168-152-206.us-west-2.compute.internal
node/ip-192-168-152-206.us-west-2.compute.internal already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-rd5dk, kube-system/ebs-csi-node-x7x5w, kube-system/kube-proxy-mklzs
evicting pod kube-system/ebs-csi-controller-7d6f4987dd-hqn2t
^[[Apod/ebs-csi-controller-7d6f4987dd-hqn2t evicted
node/ip-192-168-152-206.us-west-2.compute.internal drained
❯ kubectl drain --delete-emptydir-data --ignore-daemonsets ip-192-168-160-227.us-west-2.compute.internal
node/ip-192-168-160-227.us-west-2.compute.internal already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-64jtt, kube-system/ebs-csi-node-k698d, kube-system/kube-proxy-k6vgp
evicting pod kube-system/ebs-csi-controller-7d6f4987dd-vdn8p
error when evicting pods/"ebs-csi-controller-7d6f4987dd-vdn8p" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod kube-system/ebs-csi-controller-7d6f4987dd-vdn8p
pod/ebs-csi-controller-7d6f4987dd-vdn8p evicted
node/ip-192-168-160-227.us-west-2.compute.internal drained

❯ kubectl delete node ip-192-168-152-206.us-west-2.compute.internal
node "ip-192-168-152-206.us-west-2.compute.internal" deleted
❯ kubectl delete node ip-192-168-160-227.us-west-2.compute.internal
node "ip-192-168-160-227.us-west-2.compute.internal" deleted

swarupsrini · 2024-04-23T02:00:20Z

Note that deploying the driver without a PDB with MAX_UNAVAILABLE set to 1 is highly discouraged because a cluster without a running ebs-csi-controller pod means that no EBS Volumes can be provisioned/attached/detached/deleted, which will delay stateful workload starts/terminations.

Hi @AndrewSirenko, I wanted to follow-up here. We use EBS in non-critical apps so we don't mind delays or availability dips with EBS operations, just as long as existing volumes won't encounter errors and any operation will be eventually consistent. I would rather disable the PDB and allow things like Karpenter consolidation to work properly. Do you see this as a valid use-case for this configuration?

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable PDB for managed addon #1934

Disable PDB for managed addon #1934

damvinod commented Feb 16, 2024

k8s-ci-robot commented Feb 16, 2024

AndrewSirenko commented Feb 16, 2024

damvinod commented Feb 17, 2024 •

edited

Loading

AndrewSirenko commented Feb 19, 2024 •

edited

Loading

swarupsrini commented Apr 23, 2024

Disable PDB for managed addon #1934

Disable PDB for managed addon #1934

Comments

damvinod commented Feb 16, 2024

k8s-ci-robot commented Feb 16, 2024

AndrewSirenko commented Feb 16, 2024

damvinod commented Feb 17, 2024 • edited Loading

AndrewSirenko commented Feb 19, 2024 • edited Loading

swarupsrini commented Apr 23, 2024

damvinod commented Feb 17, 2024 •

edited

Loading

AndrewSirenko commented Feb 19, 2024 •

edited

Loading