Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable PDB for managed addon #1934

Open
damvinod opened this issue Feb 16, 2024 · 5 comments
Open

Disable PDB for managed addon #1934

damvinod opened this issue Feb 16, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@damvinod
Copy link

/triage support
Hi, is there a way to delete/change PDB for the add-on via the config ? I don't see the config in the json schema.

@k8s-ci-robot
Copy link
Contributor

@damvinod: The label(s) triage/support cannot be applied, because the repository doesn't have them.

In response to this:

/triage support
Hi, is there a way to delete/change PDB for the add-on via the config ? I don't see the config in the json schema.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AndrewSirenko
Copy link
Contributor

Thanks for opening an issue @damvinod.

Today, we don't offer a way to disable/change the ebs-csi-controller deployment's pod disruption budget in either our Helm chart or EKS-managed addon config. I'll treat this issue as a feature request to add that functionality.

Could you provide some details about your use case for disabling the PDB, as under normal circumstances doing so would be inadvisable?

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 16, 2024
@damvinod
Copy link
Author

damvinod commented Feb 17, 2024

Currently we have 3 nodes in an EKS cluster with 2 replicas of ebs-csi-controller. We have a use-case to rollover 2 nodes in parallel instead of default 1 so that the rollover can be done quickly.

Since the default PDB of ebs-csi-controller MAX_UNAVAILABLE is 1 we can't roll 2 nodes in parallel. So wanted to check if the PDB can be customized/deleted.

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Feb 19, 2024

Hi @damvinod, thank you for your use-case.

We can expose a configuration option to disable the creation of the controller's PDB, so that users like you can deploy a custom PDB on their own after deploying the driver.

Note that deploying the driver without a PDB with MAX_UNAVAILABLE set to 1 is highly discouraged because a cluster without a running ebs-csi-controller pod means that no EBS Volumes can be provisioned/attached/detached/deleted, which will delay stateful workload starts/terminations.


Note that it may be possible to rollover 2 nodes in your cluster with the PDB, because as long as there is an ebs-csi-controller on that third node, the PDB won't block your node rollovers. You can make sure that is the case by cordoning + draining the nodes before deleting them so that any ebs-csi-controller pods can get rescheduled, having 3 replicas, or evicting the ebs-csi-controller pod manually from nodes you will rollover.

Here is an example of me following the no-volume-lifecycle-outage version of this process, purposefully rolling over the two nodes that have the ebs-csi-controller pods scheduled.

❯ kubectl get nodes
NAME                                            STATUS   ROLES    AGE     VERSION
ip-192-168-105-23.us-west-2.compute.internal    Ready    <none>   6m46s   v1.27.9-eks-5e0fdde
ip-192-168-152-206.us-west-2.compute.internal   Ready    <none>   6m42s   v1.27.9-eks-5e0fdde
ip-192-168-160-227.us-west-2.compute.internal   Ready    <none>   6m41s   v1.27.9-eks-5e0fdde

❯ kubectl cordon ip-192-168-152-206.us-west-2.compute.internal
node/ip-192-168-152-206.us-west-2.compute.internal cordoned
❯ kubectl cordon ip-192-168-160-227.us-west-2.compute.internal
node/ip-192-168-160-227.us-west-2.compute.internal cordoned

❯ kubectl drain --delete-emptydir-data --ignore-daemonsets ip-192-168-152-206.us-west-2.compute.internal
node/ip-192-168-152-206.us-west-2.compute.internal already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-rd5dk, kube-system/ebs-csi-node-x7x5w, kube-system/kube-proxy-mklzs
evicting pod kube-system/ebs-csi-controller-7d6f4987dd-hqn2t
^[[Apod/ebs-csi-controller-7d6f4987dd-hqn2t evicted
node/ip-192-168-152-206.us-west-2.compute.internal drained
❯ kubectl drain --delete-emptydir-data --ignore-daemonsets ip-192-168-160-227.us-west-2.compute.internal
node/ip-192-168-160-227.us-west-2.compute.internal already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/aws-node-64jtt, kube-system/ebs-csi-node-k698d, kube-system/kube-proxy-k6vgp
evicting pod kube-system/ebs-csi-controller-7d6f4987dd-vdn8p
error when evicting pods/"ebs-csi-controller-7d6f4987dd-vdn8p" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod kube-system/ebs-csi-controller-7d6f4987dd-vdn8p
pod/ebs-csi-controller-7d6f4987dd-vdn8p evicted
node/ip-192-168-160-227.us-west-2.compute.internal drained

❯ kubectl delete node ip-192-168-152-206.us-west-2.compute.internal
node "ip-192-168-152-206.us-west-2.compute.internal" deleted
❯ kubectl delete node ip-192-168-160-227.us-west-2.compute.internal
node "ip-192-168-160-227.us-west-2.compute.internal" deleted

@swarupsrini
Copy link

Note that deploying the driver without a PDB with MAX_UNAVAILABLE set to 1 is highly discouraged because a cluster without a running ebs-csi-controller pod means that no EBS Volumes can be provisioned/attached/detached/deleted, which will delay stateful workload starts/terminations.

Hi @AndrewSirenko, I wanted to follow-up here. We use EBS in non-critical apps so we don't mind delays or availability dips with EBS operations, just as long as existing volumes won't encounter errors and any operation will be eventually consistent. I would rather disable the PDB and allow things like Karpenter consolidation to work properly. Do you see this as a valid use-case for this configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants