TEP-0013 for adding a limit to pipeline concurrency #228

NikeNano · 2020-10-08T06:50:20Z

After suggestion from @jerop, I have made a rough draft of a TEP for how to limit work related to Pipeline concurrency. This has been discussed in the following two issues:tektoncd/pipeline#2591, tektoncd/pipeline#1305.

Related PR: tektoncd/pipeline#3112

Would be great if you could help out with the TEP @jerop, thank you!

As a starting point I understand it as the two issues discuss similar but different things, tektoncd/pipeline#2591 relates to concurrency of tasks within a pipeline while tektoncd/pipeline#1305 the concurrency of several pipelines, but maybe I am misunderstanding the discussions?

linux-foundation-easycla · 2020-10-08T06:50:23Z

The committers are authorized under a signed CLA.

✅ Niklas Hansson (87541bd)

pritidesai · 2020-10-09T21:05:22Z

/area tep

pritidesai · 2020-10-09T21:05:49Z

/kind tep

jerop · 2020-10-12T14:28:23Z

Would be great if you could help out with the TEP @jerop, thank you!

@NikeNano thank you for creating this TEP, will contribute to it ~~this week~~ during the week of Oct 19th

ibotty

some typos I noticed.

teps/0013-limit-pipeline-conecurrency.md

jerop

Thank you for putting this together @NikeNano -- suggested some additions and changes to the TEP :)

nit: the tep filename has a typo

teps/0013-limit-pipeline-conecurrency.md

bobcatfish · 2020-10-27T19:33:22Z

Thanks for getting started on this @NikeNano !

I want to also draw your attention to #203

It's not exactly solving the same problem, but in solving the problem you are describing here, we have 2 choices:

Add a feature to Tekton itself (i.e. directly in the pipelines controller)
Create a new/separate component

The proposal in #203 unlocks the ability to do (2) by making it possible for some external service to decide when a PipelineRun or TaskRun is no longer "pending".

I am personally more inclined toward a solution like (2) because that would prevent the Pipelines controller from taking on more responsibilities and also would make it possible for folks to implement their own custom concurrency limitations however they wanted.

NikeNano · 2020-10-28T07:17:58Z

Thanks for all the feedback, I will try to look at it later today or tomorrow and comeback with the updates.

NikeNano · 2020-11-02T07:12:38Z

Thanks for all the comments @jerop! I will look over it a bit more and fixe some last things before done.

NikeNano · 2020-11-03T19:28:23Z

Add a feature to Tekton itself (i.e. directly in the pipelines controller)

Create a new/separate component

Yeah, I was inclined to favour 1, but after reading #203 is might give more possibilities if we go with 2.

The proposal in #203 unlocks the ability to do (2) by making it possible for some external service to decide when a
PipelineRun or TaskRun is no longer "pending".

Both TEPS seems to share some things in common, thanks for the pointer.

I am personally more inclined toward a solution like (2) because that would prevent the Pipelines controller from taking on more responsibilities and also would make it possible for folks to implement their own custom concurrency limitations however they wanted.

Would we in that case let the controller reach an external service to see if TaskRuns should be scheduled? As I understand it this new/seperate component has to interact with the controller to effect the way things are scheduled, is it wise to add this dependence to the controller of a separate service @bobcatfish?

bobcatfish · 2020-11-03T19:47:53Z

@NikeNano > Would we in that case let the controller reach an external service to see if TaskRuns should be scheduled? As I understand it this new/seperate component has to interact with the controller to effect the way things are scheduled, is it wise to add this dependence to the controller of a separate service @bobcatfish?

I'm thinking of it kinda the other way around!

Looking at tektoncd/pipeline#3112 you were proposing adding MaxParallel to a PipelineRun. If we went the route where something external (let's call it "Limit Service") was in charge of these limits, it could look something like this:

PipelineRun is created
Pipelines controller sees PipelineRun, starts creating TaskRuns <-- each TaskRun needs to be created with spec.status.Pending (as per TEP-0015 - Add a pending setting to Tekton PipelineRun and TaskRuns #203) (TBD how we'd express that in the API)
Pipelines controller sees the new TaskRuns, but they all have spec.status.Pending; it doesn't do anything with them
Limit Service also sees the TaskRusn with spec.status.Pending; it can now apply whatever logic it wants to determine if the TaskRuns are ready to run or not (e.g. maybe it checks a rule that says "only run a max of 4 Tasks at once per PipelineRun"), or check back later to see if it's ready to run
When Service X decided the TaskRun can run, it removes spec.status.Pending from the TaskRuns(s)
Pipelines controller now sees the TaskRuns are not longer pending, and it starts executing them

This is definitely more complicated but it also means the logic to determine the limits can be entirely pluggable <-- and how important this is probably depends on the use cases, e.g. if "MaxParallel" at a PipelineRun level meets 99% of ppl's use cases, maybe that's enough. Looking at tektoncd/pipeline#1305 as an example tho @jstrachan mentioned some cases where maybe you want limits on the number of Runs against a specific repository even

NikeNano · 2020-11-03T19:56:51Z

Thanks for the clarification! I like the idea with the flexibility and possibility for users to extend it further. I will update the suggested solution.

NikeNano · 2020-11-10T21:02:04Z

FYI: have not forgot, just have had to much to do. Will try to get this done before the weekend.

bobcatfish · 2020-11-11T16:26:42Z

Thanks for the update @NikeNano , no rush!

NikeNano · 2020-11-15T16:27:50Z

Update the TEP now @bobcatfish, @jerop, will have more time to respond faster now as well!

afrittoli · 2020-11-23T10:52:51Z

@NikeNano could you squash your 31 commits into one please? Before we can proceed on this you'd need to sign the CLA too.
Thank you!

jerop · 2020-11-23T14:42:02Z

Thank you for the work on this @NikeNano -- I like the new approach much more!

@tektoncd/core-maintainers please take a look :)

tekton-robot · 2020-11-23T21:11:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ncskier
You can assign the PR to them by writing /assign @ncskier in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

teps/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

NikeNano · 2020-11-23T21:29:37Z

@afrittoli I have signed it now, but seems to be some latency before it goes through. Will check tomorrow again.

This TEP describes how to add pipeline concurrency limits. Author: Niklas Hansson <niklas.sven.hansson@gmail.com> Co-authored-by: Jerop Kipruto <jerop@google.com>

NikeNano · 2020-11-29T15:11:45Z

@afrittoli I have squashed and signed it. How do I run the linting locally to debug? Thank you

ghost

I've added a lot of questions throughout. I'm still a little hazy on how exactly this would all work, particularly wrt RBAC.

ghost · 2020-12-01T18:34:03Z

teps/0013-limit-pipeline-concurrency.md

+
+As suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800), we can add a field - `MaxParallelTasks` - to `PipelineRunSpec` which is an integer that represents the maximum number of `Tasks` that can run concurrently in the `Pipeline`. 
+
+type PipelineRunSpec struct {


Might be worth wrapping this in backticks to make it into a code block.

ghost · 2020-12-01T18:47:05Z

teps/0013-limit-pipeline-concurrency.md

+
+Enable users to limit the number of tasks that can run simultaneously in a pipeline, which could help with:
+
+- Tracking and limiting how much resources a Pipeline is consuming, and thus how much it costs.


It would be great to get more specific here. Reading tektoncd/pipeline#2591 it doesn't sound like this feature is supposed to solve resource or cost problems in the cluster - I think ResourceQuotas would be the right tool for that? I think this is supposed to solve the problem of flooding a service that the Pipeline relies on. In tektoncd/pipeline#2591 (comment) it's a Database server that is hit with 100 parallel reloads.

Maybe something more along the lines of:

Allow a user to limit the number of time-consuming or heavy operations being sent to a service at once in order to avoid accidentally causing a denial-of-service on it.

Good suggestion, will update.

ghost · 2020-12-01T18:49:13Z

teps/0013-limit-pipeline-concurrency.md

+-->
+
+#### Story 1
+User has a Pipeline with 100 independent Tasks but they don't want all 100 tasks to run at once.


Why not? It would be great to get a more concrete example, like the database server example from issue tektoncd/pipeline#2591

Not sure I agree on the need of being more specific.

ghost · 2020-12-01T18:49:26Z

teps/0013-limit-pipeline-concurrency.md

+#### Story 1
+User has a Pipeline with 100 independent Tasks but they don't want all 100 tasks to run at once.
+#### Story 2
+User wants to limit amount of resources used by a Pipeline at a given time.


This would be solved by a ResourceQuota.

ghost · 2020-12-01T18:56:52Z

teps/0013-limit-pipeline-concurrency.md

+
+Separating the logic if a `TaskRun` is allowed to run from the `Task` controller allows for extensibility for adding custom logic to the `Limit Service`. 
+
+As suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800), we can add a field - `MaxParallelTasks` - to `PipelineRunSpec` which is an integer that represents the maximum number of `Tasks` that can run concurrently in the `Pipeline`. 


Hm. Up until here the document describes limiting concurrent Tasks per-Pipeline. Are we intending to limit per Pipeline or per PipelineRun?

Good point, I guess PipelineRun would allow for a larger flexibility. What do you think would be best @sbwsg ?

If someone's trying to limit the number of concurrent uses of an external resource like a DB then Pipeline-level maximum makes sense (or even Task-level tbh). If it's PipelineRun-level then a user can just spawn more PipelineRuns and ultimately overwhelm the external thing anyway I think, right? What info does the LimitService have access to in order to decide what can / cannot run? Does it even make sense to include a configurable MaxParallelTasks as part of Tekton's CRDs? Why not make this part of the LimitService's configuration?

So I think this depends on the use-cases we're trying to solve. If we're going to pursue this we really have to nail these down and clearly decide what we're trying to achieve.

ghost · 2020-12-01T19:02:19Z

teps/0013-limit-pipeline-concurrency.md

+
+### Goals
+
+- Limit how many tasks can run concurrently in a Pipeline.


Hm, are we limiting concurrent TaskRuns per Pipeline or per PipelineRun? I can imagine both features being desirable for different scenarios but I'm curious which one we're specifically aiming for here.

see comment above.

ghost · 2020-12-01T19:08:34Z

teps/0013-limit-pipeline-concurrency.md

+
+`MaxParallelTasks` has to be >= 0 in. If `MaxParallelTasks` is not specified there should be no limit to how many `TaskRun` that can run in parallel and thus `spec.status.Pending` should be removed from all `TaskRuns`. 
+
+In order to not end up with a deadlock the order of the `Tasks` in a `Pipeline` has to be respected and accounted for by the `Limit service`. 


Could you go into a bit more detail on how the deadlock could happen in the first place? That sounds like a possible Risk/Mitigation we need to be aware of.

ghost · 2020-12-01T19:09:10Z

teps/0013-limit-pipeline-concurrency.md

+
+The `Limit Service` could run similar to a control loop checking `TaskRuns` and the restrictions of `MaxParallelTasks` for the related `Pipeline`. If the count of running `TaskRuns` is less than `MaxParallelTasks`, a `TaskRun` would be update and `spec.status.Pending` removed. If the count of running `TaskRuns` equals `MaxParallelTasks`, no `TaskRun` would be updated until later when another `TaskRun` is completed. 
+
+`MaxParallelTasks` has to be >= 0 in. If `MaxParallelTasks` is not specified there should be no limit to how many `TaskRun` that can run in parallel and thus `spec.status.Pending` should be removed from all `TaskRuns`. 


What happens if MaxParallelTasks is specified but there's no Limit Service running in the cluster? Is there a default Limit Service that Tekton Pipelines comes with? Or are we asking all users to implement their own?

The goal is to have one by default, but allow users to implement there own.

ghost · 2020-12-01T19:15:15Z

teps/0013-limit-pipeline-concurrency.md

+	MaxParallelTasks int `json:"maxParallelTasks,omitempty"`
+}
+
+The `Limit Service` could run similar to a control loop checking `TaskRuns` and the restrictions of `MaxParallelTasks` for the related `Pipeline`. If the count of running `TaskRuns` is less than `MaxParallelTasks`, a `TaskRun` would be update and `spec.status.Pending` removed. If the count of running `TaskRuns` equals `MaxParallelTasks`, no `TaskRun` would be updated until later when another `TaskRun` is completed. 


How would the proposed Limit Service work in multi-tenant scenarios? Would there be one Limit Service per tenant? If not are you expecting the Limit Service to have global read and write access to all TaskRuns?

It might be worth documenting a bit about how RBAC requirements might be affected with this feature.

hmm good point, I dont know. Maybe you have some better thought on this @bobcatfish ?

bobcatfish · 2021-01-25T17:28:45Z

/assign @sbwsg
/assign @chhsia0

tekton-robot · 2021-01-25T17:28:47Z

@bobcatfish: GitHub didn't allow me to assign the following users: chhsia0.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @sbwsg
/assign @chhsia0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ghost · 2021-01-26T15:42:03Z

In the experimental repo there's now discussion of building a controller that provides support for concurrency limits and demonstrates to Operators how to build their own. Operators could then take this controller and base their own implementation on it according to their org's specific concurrency rules. I'm not sure a TEP is necessary any longer if we go that route.

NikeNano · 2021-01-26T20:55:40Z

the experimental repo

I think it still would be relevant to have some sort of concurrency limit in the default controller for Tekton, maintaining a controller on your own requires some work. I think it sounds great to allow for the flexibility to allow operators to write there own though.

ghost · 2021-01-27T15:53:25Z

I want to make sure this point isn't missed: the proposed controller in tektoncd/experimental#699 is intended to be functional and support options for configuration. Operators won't have to build their own if their use-cases are supported by the implementation there. This doesn't seem that different to the LimitService proposed here, where we'd provide some kind default implementation / configuration out the box I guess?

afrittoli · 2021-02-01T14:24:49Z

teps/0013-limit-pipeline-concurrency.md

+
+## Requirements
+
+- Users can specify the maximum number of Tasks that can run concurrently in a Pipeline.


I agree with @sbwsg that it would be good to see some concrete use case to better understand if this is the right approach to solve the problem.

If a pipeline or task requires a specific service, which we do not want to overload, limiting the number of concurrent tasks in a pipeline would not allow to control that. If the problem is about resource consumption on the cluster, again that might not provide a solution since:

we allow for execution of custom tasks, and we do not have control on the amount of resource used by a custom task (could be zero pods, or 10 pods)

the runtime model might change, today a task == a pod, a pipeline == N pods, but we are investigating alternative runtime approaches

pritidesai · 2021-02-08T17:22:27Z

Experiment with this in experimental repo to begin with.
assigning it to @imjasonh

pritidesai · 2021-02-08T18:18:10Z

/assign @imjasonh

ghost · 2021-03-01T17:17:56Z

Discussed again in API Working Group today.

We're going to close this TEP for now on the basis that we have a design ongoing in experimental. In combination with the Pending TEP I think this gives us enough room to start developing solutions to this problem without changes to Tekton Pipelines' Controller.

This doesn't preclude bringing this functionality into the controller in future though, if needed. Reopening this TEP or starting a new one at that time makes total sense.

/close

tekton-robot · 2021-03-01T17:17:58Z

@sbwsg: Closed this PR.

In response to this:

Discussed again in API Working Group today.

We're going to close this TEP for now on the basis that we have a design ongoing in experimental. In combination with the Pending TEP I think this gives us enough room to start developing solutions to this problem without changes to Tekton Pipelines' Controller.

This doesn't preclude bringing this functionality into the controller in future though, if needed. Reopening this TEP or starting a new one at that time makes total sense.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot requested review from iancoffey and pratap0007 October 8, 2020 06:50

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 8, 2020

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Oct 9, 2020

ibotty reviewed Oct 26, 2020

View reviewed changes

teps/0013-limit-pipeline-conecurrency.md Outdated Show resolved Hide resolved

teps/0013-limit-pipeline-conecurrency.md Outdated Show resolved Hide resolved

teps/0013-limit-pipeline-conecurrency.md Outdated Show resolved Hide resolved

jerop requested changes Oct 26, 2020

View reviewed changes

tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 1, 2020

afrittoli changed the title ~~TEP for adding a limit to pipeline concurrency~~ TEP-0013 for adding a limit to pipeline concurrency Nov 16, 2020

imjasonh mentioned this pull request Nov 23, 2020

Implement Pending PipelineRun status (TEP-0015) tektoncd/pipeline#3522

Merged

4 tasks

NikeNano force-pushed the master branch from 333d8fd to 6f94cc8 Compare November 23, 2020 21:11

NikeNano force-pushed the master branch 2 times, most recently from d06bc5e to a73fc49 Compare November 23, 2020 21:25

tekton-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 23, 2020

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 23, 2020

TEP 13-Adding a limit to pipeline concurrency

87541bd

This TEP describes how to add pipeline concurrency limits. Author: Niklas Hansson <niklas.sven.hansson@gmail.com> Co-authored-by: Jerop Kipruto <jerop@google.com>

NikeNano force-pushed the master branch from a73fc49 to 87541bd Compare November 25, 2020 09:29

ghost reviewed Dec 1, 2020

View reviewed changes

tekton-robot assigned ghost Jan 25, 2021

ghost mentioned this pull request Jan 26, 2021

Concurrency limiter controller tektoncd/experimental#699

Open

afrittoli reviewed Feb 1, 2021

View reviewed changes

Base automatically changed from master to main February 3, 2021 16:34

tekton-robot assigned imjasonh Feb 8, 2021

tekton-robot closed this Mar 1, 2021


		As suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800), we can add a field - `MaxParallelTasks` - to `PipelineRunSpec` which is an integer that represents the maximum number of `Tasks` that can run concurrently in the `Pipeline`.

		type PipelineRunSpec struct {


		Enable users to limit the number of tasks that can run simultaneously in a pipeline, which could help with:

		- Tracking and limiting how much resources a Pipeline is consuming, and thus how much it costs.


		Separating the logic if a `TaskRun` is allowed to run from the `Task` controller allows for extensibility for adding custom logic to the `Limit Service`.

		As suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800), we can add a field - `MaxParallelTasks` - to `PipelineRunSpec` which is an integer that represents the maximum number of `Tasks` that can run concurrently in the `Pipeline`.


		### Goals

		- Limit how many tasks can run concurrently in a Pipeline.


		`MaxParallelTasks` has to be >= 0 in. If `MaxParallelTasks` is not specified there should be no limit to how many `TaskRun` that can run in parallel and thus `spec.status.Pending` should be removed from all `TaskRuns`.

		In order to not end up with a deadlock the order of the `Tasks` in a `Pipeline` has to be respected and accounted for by the `Limit service`.


		The `Limit Service` could run similar to a control loop checking `TaskRuns` and the restrictions of `MaxParallelTasks` for the related `Pipeline`. If the count of running `TaskRuns` is less than `MaxParallelTasks`, a `TaskRun` would be update and `spec.status.Pending` removed. If the count of running `TaskRuns` equals `MaxParallelTasks`, no `TaskRun` would be updated until later when another `TaskRun` is completed.

		`MaxParallelTasks` has to be >= 0 in. If `MaxParallelTasks` is not specified there should be no limit to how many `TaskRun` that can run in parallel and thus `spec.status.Pending` should be removed from all `TaskRuns`.


		## Requirements

		- Users can specify the maximum number of Tasks that can run concurrently in a Pipeline.

TEP-0013 for adding a limit to pipeline concurrency #228

TEP-0013 for adding a limit to pipeline concurrency #228

Conversation

NikeNano commented Oct 8, 2020 • edited by bobcatfish Loading

linux-foundation-easycla bot commented Oct 8, 2020 • edited Loading

pritidesai commented Oct 9, 2020

pritidesai commented Oct 9, 2020

jerop commented Oct 12, 2020 • edited Loading

ibotty left a comment

Choose a reason for hiding this comment

jerop left a comment

Choose a reason for hiding this comment

bobcatfish commented Oct 27, 2020

NikeNano commented Oct 28, 2020

NikeNano commented Nov 2, 2020

NikeNano commented Nov 3, 2020

bobcatfish commented Nov 3, 2020

NikeNano commented Nov 3, 2020

NikeNano commented Nov 10, 2020

bobcatfish commented Nov 11, 2020

NikeNano commented Nov 15, 2020

afrittoli commented Nov 23, 2020

jerop commented Nov 23, 2020 • edited Loading

tekton-robot commented Nov 23, 2020

NikeNano commented Nov 23, 2020

NikeNano commented Nov 29, 2020

ghost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost Jan 7, 2021 • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish commented Jan 25, 2021

tekton-robot commented Jan 25, 2021

ghost commented Jan 26, 2021

NikeNano commented Jan 26, 2021

ghost commented Jan 27, 2021

Choose a reason for hiding this comment

pritidesai commented Feb 8, 2021

pritidesai commented Feb 8, 2021

ghost commented Mar 1, 2021

tekton-robot commented Mar 1, 2021

NikeNano commented Oct 8, 2020 •

edited by bobcatfish

Loading

linux-foundation-easycla bot commented Oct 8, 2020 •

edited

Loading

jerop commented Oct 12, 2020 •

edited

Loading

jerop commented Nov 23, 2020 •

edited

Loading

ghost Jan 7, 2021 •

edited by ghost

Loading