-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Argo/Tekton workflows #74
Comments
FYI @terrytangyuan Also, extracted from a comment in https://bit.ly/kueue-apis (can't find the person's github)
The idea is to allow for a dependent job to jump to the head of the queue when the dependencies are met. |
Yes, but it essentially only jumps to the head of the line if it already was at the head of the line. |
I guess I'll have to read through the design doc for queue APIs in order to understand the use case better here. Any thoughts on what the integration looks like and how the two interoperate with each other? |
Consider there to be two components. a queue, and a scheduler. Sometimes in the real world, its a family waiting in line. One member goes off to use the bathroom. If they are not back by the time its their turn, they usually say, "let the next folks go, we're not ready yet". The scheduler in this case just ignores that entry and goes to the next entry in the queue. The option to allow jobs to be "not ready yet, don't schedule me, but still queue me" could be interesting to various workflow managers. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Would a similar integration like Argo and Volcano work in this case? https://github.com/volcano-sh/volcano/blob/master/example/integrations/argo/20-job-DAG.yaml |
Not really. That seems to be creating a different job for each step of the workflow. Then, each job enters the queues only after the previous step has finished. This can already be accomplished with Kueue and batch/v1.Job. We would like to enhance the experience roughly as described here: #74 (comment) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Hi, I am trying to figure out if I could use Kueue for queueing Tekton PipelineRuns (more info on tekton at tekton.dev/docs). From reading bit.ly/kueue-apis, it seems like Kueue is going to have separate controllers that create Workload objects for different types of workloads (although I'm not sure if that's the case yet). Would it be reasonable to write a separate controller that creates Workload objects for pending PipelineRuns, and starts the PipelineRuns when the workload is admitted by the queue? I'm not sure if this is possible because it seems like kueue somehow mutates the workloads' node affinity directly, and the relationship between PipelineRuns and pod specs doesn't work in quite the same way as between Jobs and pod specs. I'm also curious if it's possible to create a queue that is just based on count of running objects rather than their compute resource requirements. More details on what I'm trying to do: https://github.com/tektoncd/community/blob/main/teps/0132-queueing-concurrent-runs.md |
These controllers can live in the Kueue repo, the tekton repo or a new repo altogether.
Depends on what you want. When talking about workflows, there are two possibilities: (a) queue the entire workflow or (b) queue the steps.
Injecting node affinities is the mechanism to support fungibility (example: this job can run on ARM or x86, let kueue decide to run it where there is still quota). If this is not something that matters to you, you can not create flavors.
Kueue is a quota-based system. Currently it uses pod resource requests and we plan to add number of pods #485. I'll comment more when I finish reading the doc above. Thanks for sharing :) cc @kerthcet |
Thanks for your response!
Still in the early exploration phase, but looking forward to discussing more what would work!
Tekton uses PipelineRuns, which are DAGs of TaskRuns, and each TaskRun corresponds to a pod. One of our use cases is basically just to avoid overwhelming a kube cluster, in which case queueing based on resource requirements would be useful. However, there are some wrinkles with how we handle resource requirements, since we have containers running sequentially in a pod rather than in parallel, so the default k8s assumption that pod resource requirements are the sum of container resource requirements doesn't apply. For this reason, queueing based on TaskRun or PipelineRun count may be simpler for us. Since TaskRuns correspond to pods, queueing based on pod count would solve the TaskRun use case at least. We also have some use cases that would probably need to be met in Tekton with a wrapper API (e.g. "I want to have only 5 PipelineRuns at a time of X Pipeline that communicates with a rate-limited service"; "I want to have only one deployment PipelineRun running at a time", etc). If we could use Kueue to create a queue of at most X TaskRuns, we'd be in good shape to design something in Tekton meeting these needs. |
Yes, the pod count would help. But I would encourage users to also add pod requests. This is particularly important for HPC workflows. You might want dedicated CPUs and accelerators. I agree that it wouldn't make sense to queue at a lower level than TaskRuns. |
You are welcome to add a topic to our WG Batch meetings if you want to show your design proposals for queuing workflows. https://docs.google.com/document/d/1XOeUN-K0aKmJJNq7H07r74n-mGgSFyiEDQ3ecwsGhec/edit |
One feedback for this is we have Tekton+ArgoCD as our CICD pipelines, for cost effectiveness, we deploy tekton together with other application services(non-productive), what will happen is we will run into insufficient resources when there're a lot of CI runs. So we have to isolate them. Queueing is important for tekton as well I think. |
We have waitForPodsReady which will wait until the previous job has enough pods running, I think we can expand this to like pendingForTargetQuantity, for job, it will still return the pod number, but for tekton, it will wait for target number of pipelineRuns/taskRuns, but we need to implement the suspend in pipelineRun/taskRun. I think resource management is great for tekton, but if no, we can also make it out by watching the pipelineRun/taskRun amount. But this needs a refactor to kueue for now resources are required. Just for brainstorming. |
Another concern is about preemption, I think it will be dangerous for tekton in some cases. Like deploying applications. |
@alculquicondor @ahg-g I added argoproj/argo-workflows#12363 to track and hopefully would attract more contributors to work on this. |
@terrytangyuan FYI: we're working on kubernetes/kubernetes#121681 for workflow support. |
It is possible to use pod-level integration using the Plain Pods approach. We use this config snippet (from kueue-manager-config) to integrate Argo Workflows into Kueue:
This configuration adds a scheduling gate to each Argo Workflows pod and will only release it once there is quota available. |
Thanks for putting an example here :) Yes, that's right. The plain pod integration could potentially support the ArgoWorkflow. Regarding the features not supported in the plain pod integration, please see for more details: https://github.com/kubernetes-sigs/kueue/tree/main/keps/976-plain-pods#non-goals |
Oh that's cool. How do you set up the queue-name in the Pods? I'm not familiar with Argo. Does it have support for pods working in parallel or pods that all need to start together? Another thing to note is that behavior you are getting is that Pods are created when their dependencies complete. Meaning that, in a busy cluster, a workflow might be spending too much time waiting in the queue for each step. Is this acceptable? It's probably acceptable for some users. Would you be willing to write a tutorial for the kueue website? |
You can use either spec.template[].metadata or spec.podMetadata to define a queue.
Argo supports parallel execution of pods, and those pods are only created when each "node" of the workflow is ready to run.
I'm still waiting to see how well it works. I don't expect the wait time between nodes to be a problem, but a backlog of partially complete workflows may become problematic. Most of the use cases revolve around ETL nodes followed by process nodes and vice-versa. Depending on how the queues are configured, I could end up with too many partially complete workflows that take up ephemeral resources.
Sure. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Is there any progress for supporting argo/tekon workflows? |
I don't think anyone has followed through with it. Would you like to propose something? |
@alculquicondor I'm confused. Isn't it possible to support argo-workflows indirectly through pod integration? |
It is indeed possible. But a tighter integration, with atomic admission, would be beneficial. |
If the user want to run the step which contains multi pods only when all pods can run, we may need some methods to know which pods should be in the same workload. So only pod integration may not enough. |
cc @Zhuzhenghao Discussion about integrating Kueue with tekton. |
This is lower priority than #65, but it would be good to have an integration with a workflow framework.
Argo supports the suspend flag, the tricky part is that suspend is for the whole workflow, meaning a QueuedWorkload would need to represent the resources of the whole workflow all at once.
Ideally Argo should create jobs per sequential step, and then resource reservation happens one step at a time.
The text was updated successfully, but these errors were encountered: