-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I run an experiment on a k8s cluster with taint? #1174
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Hi @hyeonsangjeon, |
@hyeonsangjeon @jsga Currently you can't specify tolerations for the Experiment Suggestion's Pod, but you can do it for the Experiment Trials' pods since it is just a template. Check here: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#concepts how you can remove taint from one of your Nodes. |
Feel free to re-open the issue if it is needed. |
I believe we need a solution for Experiment Suggestion's Pod as well, we can't enforce people to remove their taints from their nodes. For example in our specific case, we are the team providing Katib as a service in customer clusters, we have our own nodes to deploy Katib service which are protected by taints to avoid user workload to be placed in these nodes, and we can't guarantee there will be a non-tainted node to execute our end2end tests in the clusters. So it's a blocker issue for us. |
I see that in Can we re-open this issue, please? It seems like some of the features in #1737 were addressed, but not of this issue. Please let me know if I am misunderstanding anything. I just would love to see the ability to set the |
I have a question about katib excution of on k8s taint cluster.
I ran katib on k8s with 5 clusters.
Currently, all 5 clusters have a taint.
When experiment is executed, it was not working with the pending state.
Can you tell me how can run the experiment on If you have a all the taint k8s?
Can put the toleration setting in the experiment yaml?
I ran the test yaml below this, but it didn't work.
And also pending state, when trialtemplate setting with toleration
The text was updated successfully, but these errors were encountered: