Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Argo/Tekton workflows #74

Open
ahg-g opened this issue Feb 25, 2022 · 34 comments
Open

Support Argo/Tekton workflows #74

ahg-g opened this issue Feb 25, 2022 · 34 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Feb 25, 2022

This is lower priority than #65, but it would be good to have an integration with a workflow framework.

Argo supports the suspend flag, the tricky part is that suspend is for the whole workflow, meaning a QueuedWorkload would need to represent the resources of the whole workflow all at once.

Ideally Argo should create jobs per sequential step, and then resource reservation happens one step at a time.

@ahg-g ahg-g added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 25, 2022
@alculquicondor
Copy link
Contributor

alculquicondor commented Feb 25, 2022

FYI @terrytangyuan

Also, extracted from a comment in https://bit.ly/kueue-apis (can't find the person's github)

A compromise might be a way of submitting a job, but have it "paused" so that the workflow manager can unpause it after its deps have been met, but the job still can wait in line in the queue so it doesn't add a lot of wall clock time. The scheduler would ignore any paused jobs until they are unpaused?

The idea is to allow for a dependent job to jump to the head of the queue when the dependencies are met.

@kfox1111
Copy link

Yes, but it essentially only jumps to the head of the line if it already was at the head of the line.

@terrytangyuan
Copy link
Member

terrytangyuan commented Mar 1, 2022

I guess I'll have to read through the design doc for queue APIs in order to understand the use case better here. Any thoughts on what the integration looks like and how the two interoperate with each other?

@kfox1111
Copy link

kfox1111 commented Mar 2, 2022

Consider there to be two components. a queue, and a scheduler.
The queue is where jobs wait in line. A scheduler picks entries to work on at the head of the line.

Sometimes in the real world, its a family waiting in line. One member goes off to use the bathroom. If they are not back by the time its their turn, they usually say, "let the next folks go, we're not ready yet". The scheduler in this case just ignores that entry and goes to the next entry in the queue. The option to allow jobs to be "not ready yet, don't schedule me, but still queue me" could be interesting to various workflow managers.

@alculquicondor alculquicondor changed the title Support Argo workflows Support Argo/Tekton workflows Mar 17, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2022
@kerthcet
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2022
@kannon92
Copy link
Contributor

kannon92 commented Sep 4, 2022

Would a similar integration like Argo and Volcano work in this case?

https://github.com/volcano-sh/volcano/blob/master/example/integrations/argo/20-job-DAG.yaml

@alculquicondor
Copy link
Contributor

Not really. That seems to be creating a different job for each step of the workflow. Then, each job enters the queues only after the previous step has finished. This can already be accomplished with Kueue and batch/v1.Job.

We would like to enhance the experience roughly as described here: #74 (comment)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2022
@kerthcet
Copy link
Contributor

kerthcet commented Dec 6, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023
@tenzen-y
Copy link
Member

tenzen-y commented Mar 6, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023
@lbernick
Copy link

Hi, I am trying to figure out if I could use Kueue for queueing Tekton PipelineRuns (more info on tekton at tekton.dev/docs). From reading bit.ly/kueue-apis, it seems like Kueue is going to have separate controllers that create Workload objects for different types of workloads (although I'm not sure if that's the case yet).

Would it be reasonable to write a separate controller that creates Workload objects for pending PipelineRuns, and starts the PipelineRuns when the workload is admitted by the queue? I'm not sure if this is possible because it seems like kueue somehow mutates the workloads' node affinity directly, and the relationship between PipelineRuns and pod specs doesn't work in quite the same way as between Jobs and pod specs.

I'm also curious if it's possible to create a queue that is just based on count of running objects rather than their compute resource requirements.

More details on what I'm trying to do: https://github.com/tektoncd/community/blob/main/teps/0132-queueing-concurrent-runs.md

@alculquicondor
Copy link
Contributor

it seems like Kueue is going to have separate controllers that create Workload objects for different types of workloads (although I'm not sure if that's the case yet).

These controllers can live in the Kueue repo, the tekton repo or a new repo altogether.
We currently have a controller for kubeflow MPIJob in the kueue repo. If the Tekton community is open to have this integration, we can discuss where is the best place to put it.

Would it be reasonable to write a separate controller that creates Workload objects for pending PipelineRuns, and starts the PipelineRuns when the workload is admitted by the queue?

Depends on what you want. When talking about workflows, there are two possibilities: (a) queue the entire workflow or (b) queue the steps.

I'm not sure if this is possible because it seems like kueue somehow mutates the workloads' node affinity directly, and the relationship between PipelineRuns and pod specs doesn't work in quite the same way as between Jobs and pod specs.

Injecting node affinities is the mechanism to support fungibility (example: this job can run on ARM or x86, let kueue decide to run it where there is still quota). If this is not something that matters to you, you can not create flavors.

I'm also curious if it's possible to create a queue that is just based on count of running objects rather than their compute resource requirements.

Kueue is a quota-based system. Currently it uses pod resource requests and we plan to add number of pods #485.
What kind of object would make sense to count in Tekton? I would expect that there should be resource requests somewhere.

I'll comment more when I finish reading the doc above. Thanks for sharing :)

cc @kerthcet

@lbernick
Copy link

Thanks for your response!

These controllers can live in the Kueue repo, the tekton repo or a new repo altogether. We currently have a controller for kubeflow MPIJob in the kueue repo. If the Tekton community is open to have this integration, we can discuss where is the best place to put it.

Still in the early exploration phase, but looking forward to discussing more what would work!

Kueue is a quota-based system. Currently it uses pod resource requests and we plan to add number of pods #485. What kind of object would make sense to count in Tekton? I would expect that there should be resource requests somewhere.

Tekton uses PipelineRuns, which are DAGs of TaskRuns, and each TaskRun corresponds to a pod. One of our use cases is basically just to avoid overwhelming a kube cluster, in which case queueing based on resource requirements would be useful. However, there are some wrinkles with how we handle resource requirements, since we have containers running sequentially in a pod rather than in parallel, so the default k8s assumption that pod resource requirements are the sum of container resource requirements doesn't apply. For this reason, queueing based on TaskRun or PipelineRun count may be simpler for us. Since TaskRuns correspond to pods, queueing based on pod count would solve the TaskRun use case at least.

We also have some use cases that would probably need to be met in Tekton with a wrapper API (e.g. "I want to have only 5 PipelineRuns at a time of X Pipeline that communicates with a rate-limited service"; "I want to have only one deployment PipelineRun running at a time", etc). If we could use Kueue to create a queue of at most X TaskRuns, we'd be in good shape to design something in Tekton meeting these needs.

@alculquicondor
Copy link
Contributor

Since TaskRuns correspond to pods, queueing based on pod count would solve the TaskRun use case at least.

Yes, the pod count would help. But I would encourage users to also add pod requests. This is particularly important for HPC workflows. You might want dedicated CPUs and accelerators.

I agree that it wouldn't make sense to queue at a lower level than TaskRuns.

@alculquicondor
Copy link
Contributor

You are welcome to add a topic to our WG Batch meetings if you want to show your design proposals for queuing workflows.

https://docs.google.com/document/d/1XOeUN-K0aKmJJNq7H07r74n-mGgSFyiEDQ3ecwsGhec/edit

@kerthcet
Copy link
Contributor

One feedback for this is we have Tekton+ArgoCD as our CICD pipelines, for cost effectiveness, we deploy tekton together with other application services(non-productive), what will happen is we will run into insufficient resources when there're a lot of CI runs. So we have to isolate them. Queueing is important for tekton as well I think.

@kerthcet
Copy link
Contributor

We have waitForPodsReady which will wait until the previous job has enough pods running, I think we can expand this to like pendingForTargetQuantity, for job, it will still return the pod number, but for tekton, it will wait for target number of pipelineRuns/taskRuns, but we need to implement the suspend in pipelineRun/taskRun.

I think resource management is great for tekton, but if no, we can also make it out by watching the pipelineRun/taskRun amount. But this needs a refactor to kueue for now resources are required. Just for brainstorming.

@kerthcet
Copy link
Contributor

Another concern is about preemption, I think it will be dangerous for tekton in some cases. Like deploying applications.

@terrytangyuan
Copy link
Member

@alculquicondor @ahg-g I added argoproj/argo-workflows#12363 to track and hopefully would attract more contributors to work on this.

@tenzen-y
Copy link
Member

@terrytangyuan FYI: we're working on kubernetes/kubernetes#121681 for workflow support.

@sam-leitch-oxb
Copy link

It is possible to use pod-level integration using the Plain Pods approach.

We use this config snippet (from kueue-manager-config) to integrate Argo Workflows into Kueue:

          integrations:
            frameworks:
            - "pod"
            podOptions:
              # You can change namespaceSelector to define in which
              # namespaces kueue will manage the pods.
              namespaceSelector:
                matchExpressions:
                - key: kubernetes.io/metadata.name
                  operator: NotIn
                  values: [ kube-system, kueue-system ]
              # Kueue uses podSelector to manage pods with particular
              # labels. The default podSelector will match all the pods.
              podSelector:
                matchExpressions:
                - key: workflows.argoproj.io/completed
                  operator: In
                  values: [ "false", "False", "no" ]

This configuration adds a scheduling gate to each Argo Workflows pod and will only release it once there is quota available.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 3, 2024

It is possible to use pod-level integration using the Plain Pods approach.

We use this config snippet (from kueue-manager-config) to integrate Argo Workflows into Kueue:

          integrations:
            frameworks:
            - "pod"
            podOptions:
              # You can change namespaceSelector to define in which
              # namespaces kueue will manage the pods.
              namespaceSelector:
                matchExpressions:
                - key: kubernetes.io/metadata.name
                  operator: NotIn
                  values: [ kube-system, kueue-system ]
              # Kueue uses podSelector to manage pods with particular
              # labels. The default podSelector will match all the pods.
              podSelector:
                matchExpressions:
                - key: workflows.argoproj.io/completed
                  operator: In
                  values: [ "false", "False", "no" ]

This configuration adds a scheduling gate to each Argo Workflows pod and will only release it once there is quota available.

Thanks for putting an example here :)

Yes, that's right. The plain pod integration could potentially support the ArgoWorkflow.
However, the plain pod integration doesn't support all kueue features, such as partial admission. So the native ArgoWorkflkow support would be worth it.

Regarding the features not supported in the plain pod integration, please see for more details: https://github.com/kubernetes-sigs/kueue/tree/main/keps/976-plain-pods#non-goals

@alculquicondor
Copy link
Contributor

Oh that's cool. How do you set up the queue-name in the Pods?

I'm not familiar with Argo. Does it have support for pods working in parallel or pods that all need to start together?

Another thing to note is that behavior you are getting is that Pods are created when their dependencies complete. Meaning that, in a busy cluster, a workflow might be spending too much time waiting in the queue for each step. Is this acceptable?

It's probably acceptable for some users. Would you be willing to write a tutorial for the kueue website?

@sam-leitch-oxb
Copy link

Oh that's cool. How do you set up the queue-name in the Pods?

You can use either spec.template[].metadata or spec.podMetadata to define a queue.

I'm not familiar with Argo. Does it have support for pods working in parallel or pods that all need to start together?

Argo supports parallel execution of pods, and those pods are only created when each "node" of the workflow is ready to run.
This type of integration simply prevents each pod from executing until they pass Kueue's admission checks.

Another thing to note is that behavior you are getting is that Pods are created when their dependencies complete. Meaning that, in a busy cluster, a workflow might be spending too much time waiting in the queue for each step. Is this acceptable?

I'm still waiting to see how well it works. I don't expect the wait time between nodes to be a problem, but a backlog of partially complete workflows may become problematic.

Most of the use cases revolve around ETL nodes followed by process nodes and vice-versa. Depending on how the queues are configured, I could end up with too many partially complete workflows that take up ephemeral resources.

It's probably acceptable for some users. Would you be willing to write a tutorial for the kueue website?

Sure.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2024
@tenzen-y
Copy link
Member

tenzen-y commented Apr 3, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 3, 2024
@KunWuLuan
Copy link
Contributor

Is there any progress for supporting argo/tekon workflows?

@alculquicondor
Copy link
Contributor

I don't think anyone has followed through with it. Would you like to propose something?
I think we might require changes in both projects, but at least the Argo community is in favor of doing something: argoproj/argo-workflows#12363

@kannon92
Copy link
Contributor

@alculquicondor I'm confused. Isn't it possible to support argo-workflows indirectly through pod integration?

@alculquicondor
Copy link
Contributor

It is indeed possible. But a tighter integration, with atomic admission, would be beneficial.

@KunWuLuan
Copy link
Contributor

KunWuLuan commented Apr 26, 2024

If the user want to run the step which contains multi pods only when all pods can run, we may need some methods to know which pods should be in the same workload. So only pod integration may not enough.

@kerthcet
Copy link
Contributor

kerthcet commented Jun 3, 2024

cc @Zhuzhenghao Discussion about integrating Kueue with tekton.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

No branches or pull requests