-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TEP-0044: Decouple Task Composition from Scheduling ➰ #316
Conversation
This looks somewhat overlapping with tektoncd/pipeline#3638 even though that specific use case is listed as a non-goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Tekton's current pain-points around storage / data sharing could be really greatly improved and expanded on with task composition.
👍
/lgtm
that can be shared between them. | ||
|
||
Some problems with this: | ||
* PVCs add additional overhead, both in speed and in management (somewhat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate but similar problem (and possibly an additional use-case): an organization may have already decided on cloud buckets as their means of data sharing. Then Tekton comes along "requiring" k8s volumes and so the org is unable or unwilling to adopt Tekton.
I've seen this reported in Slack as a downside of Workspaces though unfortunately don't have a Github issue / user report to point at.
Composing a bucket-download Task with a build Task with a bucket-upload Task would go some way to solving those users' needs I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay interesting! ill try to add a use case for this - let me know if you have any ideas how to expand it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: buckets -- that's interesting. As an operator I can confirm that allocating the sort of short-lived PVCs typically needed by a pipeline is a real pain. A Task that uploads/downloads to a bucket as a pre/post Task might be useful to eliminate the need for a Pipeline PVC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object storage would have a few advantages compared to PVC:
- zero provisioning time
- it only uses the space needed, so if pipeline run is not deleted right-away, there's no unused allocated storage left around
- it's easier to access. It would be possible to even integrate into the CLI or dashboard the ability to list / view the content of the bucket, which would be a great help for troubleshooting :)
It might be slower performance-wise, it depends there the object storage is located too.
Operator could ship a preconfigured minio as optional component, or we could have an installation with minio guide.
We could decide to include this if we have some use cases that need it; for now avoiding | ||
this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.) | ||
or even worse, recursion (Task 1 uses Task 2 uses Task 1...) | ||
- One way to support this later could be via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're missing a bit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.... i dont know what the rest of this sentence was 🤭 😅 ill just delete it for now
/assign sbwsg |
creation-date: '2021-01-22' | ||
last-updated: '2021-01-22' | ||
authors: | ||
- '@bobcatfish' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skaegi would like to work on this as well
overhead of using a Pipeline, they just want to make a TaskRun, | ||
e.g. [@mattmoor's feedback on PipelineResources and the Pipeline beta](https://twitter.com/mattomata/status/1251378751515922432)) | ||
where he wants to checkout some code and [use the kaniko task](https://github.com/tektoncd/catalog/tree/master/task/kaniko/0.1) | ||
without having to fiddle with volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skaegi if you have other use case plz add :D (anyone else too!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use case is closer to what knative/build
use-case was : a simple build composed of few steps that run in sequence.
Tbh, This is a really valid use-case and it would be interesting to see how much it needs to be supported. To make a parallel with our prow CI setup, this is basically what we would need to replicate the current behavior (one check == one job => would be equal one taskrun for simplicity)
|
||
### Non-Goals | ||
|
||
- Composing Tasks within Tasks at [Task authoring time](https://github.com/tektoncd/community/blob/master/design-principles.md#reusability). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.) | ||
or even worse, recursion (Task 1 uses Task 2 uses Task 1...) | ||
- One way to support this later could be via | ||
- Supporting this composition at runtime in a PipelineRun (not quite sure what that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how it is implemented - it may be possible to use pipelineSpec
- but I agree this should not be a direct goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after @jlpettersson 's comments I think I'll remove this as a non-goal, I think I was jumping to a conclusion too early
- All Tasks should run even if one fails | ||
- This is to support use cases such as uploading test results, even if the test | ||
Task failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. I think the use case is fine, but "all tasks should run even if one fails" feels to me like a solution rather than a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I wonder how I can word this - I'm trying to express that in the unit test use case, if the test task fails, you want to still upload results. if we used "pipeline in a pod" as the solution to this, we could accomplish that with finally.
i'm gonna reword this as "It should be possible to have Tasks that run even if others fail" (starting to overlap with the exit code tep!! 😅 ). lemme know what you think!
- '@bobcatfish' | ||
--- | ||
|
||
# TEP-0044: Composing Tasks with Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for staring this 🙏
I agree that we need a way to compose tasks that is more efficient than what we have today.
I feel that the problem is not on the authoring side though, but on the runtime.
Today we compose Tasks using Pipeline, which gives us a rich model (perhaps a bit verbose) to connect tasks through params, workspaces, results and more. In itself this model allows us to write highly re-usable tasks that can be combined into complex pipelines. The API is beta, and it has a reasonable level of adoption.
On the runtime side, when we execute a pipeline against a k8s cluster, we force each task to create a new Pod, which means that today - if we use catalog tasks - we need two pods and a pvc to do something like cloning a repo and building a container image. Some very common use cases like this were covered by pipeline resources; removing them from the pictures highlights the need for alternative runtime models.
To summarise, I think the solution to the problem highlighted in this TEP will not be a new way of composing tasks in YAML, but a new way to execute them on a k8s cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree that it might be more a runtime problem than an authoring problem, I feel there is room for both.
To summarise, I think the solution to the problem highlighted in this TEP will not be a new way of composing tasks in YAML, but a new way to execute them on a k8s cluster.
This would need to be very flexible as I think we may want to execute group of tasks in the same pod and other in different ones (and on different nodes too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like folks might want to express this kind fo grouping at a Pipeline level. (In my mind Pipeline authoring straddles both "authoring time" and "runtime" cuz in some ways it's runtime configuration for TaskRuns.)
I.e. you might want to create a Pipeline that groups some Tasks such that they are run on the same pod, and be able to reuse that, without having to express that grouping in every single PipelineRun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"create a Pipeline that groups some Tasks such that they are run on the same pod" -- this is exactly what our teams have been asking us for. Their issue is that if we write fine-grained tasks to promote re-use in our task catalogs we incur a roughly 10 or so second overhead when running a sequence of Tasks. @afrittoli's point about doing the task/pod aggregation automatically is definitely interesting but I would be delighted to be able to do this manually as a first step.
volume based workspaces to share data between them in a Pipeline. e.g. specifically | ||
cloning with [git-clone](https://github.com/tektoncd/catalog/tree/master/task/git-clone/0.2), | ||
running tests with [golang-test](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) | ||
and uploading results with [gcs-upload](https://github.com/tektoncd/catalog/tree/master/task/gcs-upload/0.1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to play devil's advocate 😈 , each of these catalog tasks git-clone
, golang-test
, are gcs-upload
contains one single step which is clearly a bottleneck here. A task having more than one step makes it more efficient (one pod with multiple containers). Why not combining these tasks into one single task and encouraging users to design efficient
tasks? And TEP-0040 can come to rescue here with the step failure i.e. all steps should run even if one fails. If needed, we can implement scheduler
at the step level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the problem. git-clone
, golang-test
and gcs-upload
may make sense to be defined as one "bigger" task for sure. But what if I want to run golang-test
and gcs-upload
from my local directory (packaged as an image and extracted in a volume somehow before the pipeline/task execution) ? Do I need to redefine another task and thus duplicate work, and most importantly duplicate maintenance on this or do I want to re-use community / team / … work and just refer to those definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another detail is that although there is just one step in each of them, in each case that step is running a script which is doing several things. If we wanted to integrate these into one Task, we'd need to copy paste that script around.
We have had the idea of reusable steps come up before: tektoncd/pipeline#1260 I'm not sure if that would help here though b/c you'd probably want the step to be able to declare params/workspaces/results and I think it would end up looking like a Task with a different name (tho you'd be able to combine the steps together?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the big-one for us... when we create a pod we also have to create a secure container runtime to host it in a new light-weight Kata Containers VM. Grouping Tasks also provides some logical semantic relief in addition to performance benefits. We currently are running a sequence of 50 Tasks and suspect even if we could group all 50 into a single pod there might still be semantic value in running this is a 5 coarse grained tasks instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobcatfish thanks for starting this TEP. This is honestly a tricky subject and I feel both this TEP and #318 might be too much "opiniated" on the solution — but I feel it might be because of the way we tend to write TEPs.
So, I would like for us to define clearly what is (or are) the problem(s) that we are trying to solve — and hopefully, we shouldn't try to solve too many problems at once. For me this also mean, we may not want to think or compare our solution to PipelineResource
yet.
So far, we are using Task
as the smallest composable piece, and we use Pipeline
to compose them. I tend to agree with @afrittoli on the following.
Today we compose Tasks using Pipeline, which gives us a rich model (perhaps a bit verbose) to connect tasks through params, workspaces, results and more. In itself this model allows us to write highly re-usable tasks that can be combined into complex pipelines. The API is beta, and it has a reasonable level of adoption.
What we are trying to solve in this TEP:
- Running a
Pipeline
without the PVC / cloud-provider volumes / complex thingy to setup overhead - Running a
Task
by itself (?) that would be composed of multiple tasks — probably not, this is what we are calling aPipeline
already
If this is the only problem we are trying to solve, then, I also tend to agree with @afrittoli that this is probably more about how we can/decide to run Pipeline
than compos-ability on the Task
level.
In addition, if we bring compos-ability at the Task
level:
- It might be confusing for the authors of task / users on where should I compose things, at the
Task
level or at thePipeline
level ? - It brings some author overhead
- how to connect workspaces from one task to another ? do we connect implicitly, do we require the same name, do we allow some "mapping" like we do in
Pipeline
already ? - how to connect results and params from one task to another ? Similar to workspaces.
- how to connect workspaces from one task to another ? do we connect implicitly, do we require the same name, do we allow some "mapping" like we do in
- It brings some complexity, such as one described in the
requirement
part : when should we execute or not the composed task.
As @afrittoli, "The API is beta, and it has a reasonable level of adoption" and I also feel we may not want to offer too much different way to compose thing. I feel focusing on the compos-ability part with Pipeline
and Task
and look into how we can make it easier to run a Pipeline without overhead (by grouping tasks, running the whole pipeline as one, …) might be a better take — it would also be, I think, opening some possibilities for different implementation on the "runtime" part (aka run a tekton pipeline as a GitHub action, …)
- '@bobcatfish' | ||
--- | ||
|
||
# TEP-0044: Composing Tasks with Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree that it might be more a runtime problem than an authoring problem, I feel there is room for both.
To summarise, I think the solution to the problem highlighted in this TEP will not be a new way of composing tasks in YAML, but a new way to execute them on a k8s cluster.
This would need to be very flexible as I think we may want to execute group of tasks in the same pod and other in different ones (and on different nodes too)
volume based workspaces to share data between them in a Pipeline. e.g. specifically | ||
cloning with [git-clone](https://github.com/tektoncd/catalog/tree/master/task/git-clone/0.2), | ||
running tests with [golang-test](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) | ||
and uploading results with [gcs-upload](https://github.com/tektoncd/catalog/tree/master/task/gcs-upload/0.1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the problem. git-clone
, golang-test
and gcs-upload
may make sense to be defined as one "bigger" task for sure. But what if I want to run golang-test
and gcs-upload
from my local directory (packaged as an image and extracted in a volume somehow before the pipeline/task execution) ? Do I need to redefine another task and thus duplicate work, and most importantly duplicate maintenance on this or do I want to re-use community / team / … work and just refer to those definitions.
overhead of using a Pipeline, they just want to make a TaskRun, | ||
e.g. [@mattmoor's feedback on PipelineResources and the Pipeline beta](https://twitter.com/mattomata/status/1251378751515922432)) | ||
where he wants to checkout some code and [use the kaniko task](https://github.com/tektoncd/catalog/tree/master/task/kaniko/0.1) | ||
without having to fiddle with volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use case is closer to what knative/build
use-case was : a simple build composed of few steps that run in sequence.
Tbh, This is a really valid use-case and it would be interesting to see how much it needs to be supported. To make a parallel with our prow CI setup, this is basically what we would need to replicate the current behavior (one check == one job => would be equal one taskrun for simplicity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks somewhat overlapping with tektoncd/pipeline#3638 even though that specific use case is listed as a non-goal.
@jlpettersson i think we could still go that route for the solution - I think what I was trying to say in the non-goals is that I thought folks might want to specify their preferences at Pipeline authoring time, vs. PipelineRun authoring time, specifically referring to Tasks to run together at Runtime. I've updated the TEP to try to make this a bit more clear but now I'm wondering if I should just remove the non-goal 🤔
it feels like "PipelineRun in a Pod" is a potential solution for the problem statement here - i haven't reviewed #318 yet but I wonder if it would make sense to combine into one TEP
- '@bobcatfish' | ||
--- | ||
|
||
# TEP-0044: Composing Tasks with Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like folks might want to express this kind fo grouping at a Pipeline level. (In my mind Pipeline authoring straddles both "authoring time" and "runtime" cuz in some ways it's runtime configuration for TaskRuns.)
I.e. you might want to create a Pipeline that groups some Tasks such that they are run on the same pod, and be able to reuse that, without having to express that grouping in every single PipelineRun
that can be shared between them. | ||
|
||
Some problems with this: | ||
* PVCs add additional overhead, both in speed and in management (somewhat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay interesting! ill try to add a use case for this - let me know if you have any ideas how to expand it
We could decide to include this if we have some use cases that need it; for now avoiding | ||
this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.) | ||
or even worse, recursion (Task 1 uses Task 2 uses Task 1...) | ||
- One way to support this later could be via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.... i dont know what the rest of this sentence was 🤭 😅 ill just delete it for now
this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.) | ||
or even worse, recursion (Task 1 uses Task 2 uses Task 1...) | ||
- One way to support this later could be via | ||
- Supporting this composition at runtime in a PipelineRun (not quite sure what that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after @jlpettersson 's comments I think I'll remove this as a non-goal, I think I was jumping to a conclusion too early
volume based workspaces to share data between them in a Pipeline. e.g. specifically | ||
cloning with [git-clone](https://github.com/tektoncd/catalog/tree/master/task/git-clone/0.2), | ||
running tests with [golang-test](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) | ||
and uploading results with [gcs-upload](https://github.com/tektoncd/catalog/tree/master/task/gcs-upload/0.1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another detail is that although there is just one step in each of them, in each case that step is running a script which is doing several things. If we wanted to integrate these into one Task, we'd need to copy paste that script around.
We have had the idea of reusable steps come up before: tektoncd/pipeline#1260 I'm not sure if that would help here though b/c you'd probably want the step to be able to declare params/workspaces/results and I think it would end up looking like a Task with a different name (tho you'd be able to combine the steps together?)
- All Tasks should run even if one fails | ||
- This is to support use cases such as uploading test results, even if the test | ||
Task failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I wonder how I can word this - I'm trying to express that in the unit test use case, if the test task fails, you want to still upload results. if we used "pipeline in a pod" as the solution to this, we could accomplish that with finally.
i'm gonna reword this as "It should be possible to have Tasks that run even if others fail" (starting to overlap with the exit code tep!! 😅 ). lemme know what you think!
01cdf06
to
3c7796d
Compare
3c7796d
to
5fc6d01
Compare
- name: source-code | ||
workspace: source-code | ||
finally: | ||
# since we merged these concepts, any Foobar can have a finally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at which level(s) would whenexpressions fit in this example? would it guard Foobar
, each of the foobars
or steps
? (recently saw a user asking to guard steps with whenexpressions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work proposing all alternative and gathering a lot of related work together!
Do you envision merging this as is for now, and select specific options in a follow-up, or would you prefer to discuss on this PR and alter it before it's merged?
I vote for the former :)
/approve
|
||
## Summary | ||
|
||
This TEP addresses how the current concept of a Task embodies both a reusable unit of logic but is also a unit used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I like the new title / summary for this TEP!! ❤️
|
||
### Goals | ||
|
||
- Make it possible to combine Tasks together so that you can run multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+💯
I think one goal of Tekton should be to provide a good opinionated scheduling default that works for simple cases, and let users override it at pipeline authoring time to satisfy special requirements.
My line of thought here is that I wouldn't want for force authors (and DSLs?) to think about the scheduling issue unless they need to. If I write a clone / build / push pipeline, it should be easy to do that in a way that doesn't create unnecessary overhead, without having to make scheduling specific decisions in the pipeline definition.
If we don't want to take an opinion in the controller logic, an alternate way to achieve that could be through the catalog. What used to be git resource + build-and-push task + image resource could be a pipeline with three tasks, tailored to be efficient, available in the catalog. We would need then to have a strong composability story to back this, i.e. move the pipelines-in-pipelines experiment into the core API, so that one might consume pipelines from the catalog as if they were tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one goal of Tekton should be to provide a good opinionated scheduling default that works for simple cases, and let users override it at pipeline authoring time to satisfy special requirements.
I tend to agree with that statement. The thing I am not sure we all share the same point of view is : the current design is works well for simple cases. It has an overhead, that's all. Having a way to reduce this overhead is needed, I think it's better it has to be explicit for the user.
* Only helps us with some scheduling problems (e.g. doesn't help with parallel tasks or finally task execution) | ||
* What if you _don't_ want the last Tasks to run if the previous tasks fail? | ||
* If you want some other Task to run after these, you'll still need a workspace/volume + separate pod | ||
* What if you want more flexibility than just before and after? (e.g. you want to completely control the ordering) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is targeted to a very specific set of use cases, so to do something else we'd need a different solution - but still might still be valuable anyways? The syntax is nice and compact, as we could decide that it's the way it is, anything beyond this requires a PVC or a different solution...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree! im thinking that something like this as an experimental custom task might at least be a good step forward
* Uses concepts we already have in Tekton but upgrades them | ||
|
||
Cons: | ||
* Not clear what the idea of a PipelineResource is really giving us if it's just a wrapper for Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
Cons: | ||
* Will need to update our entrypoint logic to allow for steps running in parallel | ||
* Doesn't give as much flexibility as being explicit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had a way to set composition / scheduling rules, this could be a default behaviour when no rules are set.
The rules could even be added by a defaulting webhook, so that it become clear what decision the system has made for you. That would combine ease of use (no syntax change in many cases) with control and transparency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! ive added some notes about this as a possible tweak - in general i think we could decide to combine a few of these together if it works well
claimName: mypvc | ||
``` | ||
|
||
Running with this PipelineRun would cause all of the Tasks to be run in one pod: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to guard / validate the use of this, as doing this on a large pipeline might cause scheduling issues, unless we limit the amount of resources we request for the pod and force serialise the execution of some of the tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk ill add this as a con!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, there is similar scheduling problems with the Affinity Assistant solution, e.g. where the placeholder pod is located on a node that does not have enough resources to host the TaskRun-pods. Having all containers for a PipelineRun with an emptyDir workspace in a single unit (Pod) - "delegates" the scheduling problem to kubernetes scheduler and also provide all scheduling info upfront, whereas today it is more in pieces.
* Doesn't change anything about our API | ||
|
||
Cons: | ||
* If this custom task is widely adopted, could fork our user community |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think this could be a path to explore some of the options before we implement them in the core Tekton.
|
||
We could decide if we only support sequential execution, or support an entire DAG. Maybe even finally? | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No pros at all?
A task group CRD could be used do define a number of different execution strategies:
- sequential only
- switch (only one based on input)
- race (finish as soon as at least one is finished)
This is a different kind of problem, but different strategies may fit better with specific scheduling approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ive added this as a pro - kinda sounds to me like you're describing custom tasks in general tho? (e.g. you can have a switch statement custom task)
and felt it added more complexity without much gain (see also https://github.com/tektoncd/pipeline/issues/3052) | ||
* Doesn't help us with the overhead of multiple pods (each Task would still be a pod) | ||
|
||
### Support other ways to share data (e.g. buckets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related work: #290
|
||
[This is something we support for "linking" PipelineResources.](https://github.com/tektoncd/pipeline/blob/master/docs/install.md#configuring-pipelineresource-storage) | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No pros 😓
Even if it doesn't help for this specific problem, it would be interesting to have this feature :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha ive tried to add one - what really happened here is that i sort of ran out of steam fleshing these out 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments but overall I really like how the TEP is written now 😻
I think it can be ready to merge as soon as some typos are fixed 😉
[result](https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#using-results), it needs to | ||
you'll need to provision a PVC or do some other similar, cloud specific storage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to
should be removed I think
Tasks together and have control over the scheduling overhead (i.e. pods and volumes required) at authoring time | ||
in a way that can be reused (e.g. in a Pipeline) | ||
- Add some of [the features we don't have without PipelineResources](https://docs.google.com/document/d/1KpVyWi-etX00J3hIz_9HlbaNNEyuzP6S986Wjhl3ZnA/edit#) | ||
to Tekton Pipelines (without requiring use of PipelineResources), specifically **Task adapters/specialization** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "Task adapters/specialization" in the TEP ? (I feel the term is used there without prior definition or example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
### Goals | ||
|
||
- Make it possible to combine Tasks together so that you can run multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one goal of Tekton should be to provide a good opinionated scheduling default that works for simple cases, and let users override it at pipeline authoring time to satisfy special requirements.
I tend to agree with that statement. The thing I am not sure we all share the same point of view is : the current design is works well for simple cases. It has an overhead, that's all. Having a way to reduce this overhead is needed, I think it's better it has to be explicit for the user.
|
||
## Design details | ||
|
||
TBD - currently focusing on enumerating and examining alternatives before selecting one or more ways forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
|
||
Cons: | ||
* Only helps us with some scheduling problems (e.g. doesn't help with parallel tasks or finally task execution) | ||
* What if you _don't_ want the last Tasks to run if the previous tasks fail? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how would these play with when
guard ? It kinda fix the "I want to run this Task, no matter the previous task status", but not fully though.
Cons: | ||
* Only cluster administrators will be able to control this scheduling, there will be no runtime or authoring time | ||
flexibility | ||
* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda true for some other approach 😅
Cons: | ||
* Can create confusing chains of nested Tasks (Task A can contain Task B which can Contain Task C...) | ||
* Requires creating new Tasks to leverage the reuse (maybe embedded specs negate this?) | ||
* Doesn't help with parallel use cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to TEP-0046 right ? I feel most of the approach running multiple task in a Pod, will come with either a big refactoring of what the entrypoint does (if we keep parallel support) or a trade-off, which is "no parallel execution".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallel Steps was the very first idea to address the parallelism-problem initially in may 2020. That would also require entrypoint work. tektoncd/pipeline#2586
(@ImJasonH has some ideas here that are much more aligned with generics in other langs, he might want to update the | ||
example here.) | ||
|
||
Pros: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings us closer to GitHub Actions 😝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure whether to add this as a pro or a con XD
|
||
Cons: | ||
* Pretty dramatic API change | ||
* Foobars can contain Foobars can contain Foobars can contain Foobars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. On the other hand, we could experiment with it almost completely independently of the current CRDs (Pipeline
, Task
) 👼🏼. Different api name, group and version, maybe even an experimental project to start with and validate the concept.
Pros: | ||
* Solves one specific problem: getting data on and off of workspaces | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think another con might be another kind of overhead (push/pull at each task, … might slow the pipeline quite a bit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting! i assumed this would be just once for the entire pipeline, but i see what you mean, it could be per task using the workspace - ill add a note about this
|
||
TBD - currently focusing on enumerating and examining alternatives before selecting one or more ways forward. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see many alternatives!
But if this part now is "shared" with TEP-0046(?) then I think it should be pointed out that only a few of the alternatives solves the problem statement in TEP-0046 that is about Task parallelism, primarily.
Also a TaskRunGroup
(or TaskGroupRun
?) CRD might be a viable addition for some of the alternatives - since some are written from the pipeline authoring view, and some from the runtime view. Note this is different from a TaskGroup CRD that is targeting authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I think it should be pointed out that only a few of the alternatives solves the problem statement in TEP-0046 that is about Task parallelism, primarily
I think this is why it probably still makes sense to have 2 TEPs - tho i think in the long run even the solution for this TEP will involve supporting Tasks in parallel on the same pod
Also a TaskRunGroup (or TaskGroupRun?) CRD might be a viable addition for some of the alternatives - since some are written from the pipeline authoring view, and some from the runtime view. Note this is different from a TaskGroup CRD that is targeting authors.
I've tried to add some details about this under "Create a new Grouping CRD" but I'm not totally clear on what this would look like - would it basically contain an entire pipeline definition? or link to a pipeline ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With TaskGroupRun
CRD here, I didn't mean a new CRD for Pipeline- or Task authors, more a run-CRD in case the solution should be open for using more than one Pod (here I am assuming an alternative to TaskRun that I assume is strictly for one-to-one relation between a Task and a Pod, whereas TaskGroupRun is a thought about a many-to-one relation between Tasks and a Pod - and there could be one or multiple of TaskGroupRun related to PipelineRun).
Just thinking out loud here - no deeper thoughts.
|
||
#### Controller level | ||
|
||
This option is [TEP-0046](https://github.com/tektoncd/community/pull/318). In this option, the Tekton controller can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes and no. This is suggested as an initial configuration in TEP-0046 because, as listed in Cons-section here, it addresses a problem that seem to require a more complex solution - Task parallelism - and therefore probably need to be iterated on multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlpettersson can you elaborate a bit more? it sounds like you're saying a controller wide solution isnt necessarily the goal of TEP-0046? (or TEP-0046 is trying to say that a controller wide flag is a step toward a more complex solution?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of TEP-0046 is to solve the Task parallelism-problem, that has been discussed back and forth since may 2020 and we later added the affinity assistant. The TEP links to a Design doc for the whole Task parallelism problem including alternative solutions (including affinity assistant and custom scheduler and pod-internal workspace).
As has been noted, to solve that problem in the direction as TEP-0046 suggests, require a lot of changes, e.g. rethinking entrypoint and also propbably also the *run-CRDs (e.g. PipelineRun
, TaskRun
and perhaps TaskGroupRun
) and the use of statuses and so on. This problem most likely needs to be solved in many iteration where we learn more after each, and the initial proposal is to do this behind a feature-flag or "Controller level flag" at least until we know more.
Cons: | ||
* Only cluster administrators will be able to control this scheduling, there will be no runtime or authoring time | ||
flexibility | ||
* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is not only a Con for this "alternative".
What is written here is probably what is needed for properly addressing the problem written in TEP-0046, about Task parallelism that is a complex problem that has been discussed since april/may 2020 and where we have added the affinity assistant but that solution has shortcomings that could be addressed with a Pod-internal solution. Especially the affinity assistant solution can not mitigate the problem with autoscaling or properly scheduling to nodes with enough resources for the pipeline, as noted in tektoncd/pipeline#3049
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ive tried to add a "cons" section at the top of the alternative section mentioning that this applies to multiple solutions
|
||
Cons: | ||
* [@jlpetterson has explored this option](https://docs.google.com/document/d/1lIqFP1c3apFwCPEqO0Bq9j-XCDH5XRmq8EhYV_BPe9Y/edit#heading=h.18c0pv2k7d1a) | ||
and felt it added more complexity without much gain (see also https://github.com/tektoncd/pipeline/issues/3052) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in the end, a custom scheduler does not take us any further with some of the fundamental problems we have with the affinity assistant e.g. respecting autoscaling and mitigate problems with nodes that have too little resources to host the pipelineRun, e.g. tektoncd/pipeline#3049
* If it's important for a Pipeline to be executed in a certain way, that information will have to be encoded somewhere | ||
other than the Pipeline | ||
|
||
#### PipelineRun: flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one could also be for TEP-0046, when implementation is more mature, to move from "Controller level config" (aka feature-flag?)
flexibility | ||
* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside | ||
the controller and has a lot of gotchas we'll need to iron out (see | ||
[https://hackmd.io/@vdemeester/SkPFtAQXd](https://hackmd.io/@vdemeester/SkPFtAQXd) for some more brainstorming) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice post @vdemeester !!
I added a comment :)
be configured to always execute Pipelines inside one pod. | ||
|
||
Pros: | ||
* Making it possible to execute a Pipeline in a pod will also pave the way to be able to support use cases such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a pro of multiple possible solutions described here - quite a few of them require that a pipeline will need to be fully runnable as a pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, ill add this at the top of the alternatives section as well
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli, sbwsg, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
4d65ce8
to
d9e8735
Compare
Thanks for all the feedback @afrittoli @jlpettersson @sbwsg @vdemeester ! I've tried to address all of the comments (let me know what i've missed!)
I'm hoping we could merge this with the alternatives listed and then keep discussing and iterate from there! 🙏 (if folks are happy to merge this as-is I can keep addressing any more comments that come in via the next PR) |
/lgtm |
This TEP describes some difficulty folks have run into, especially when trying to use Tekton Pipelines without PipelineResources: if you want to use the functionality of multiple Tasks, you need to put them together in a Pipeline and share data between them with volumes. It could be a useful abstraction, and more efficient, to make it possible to combine Tasks together without scheudling them on different pods.
In the review @vdemeester pointed out that it feels like we are trying to deal with 2 different things - for me the main reason for that is b/c I have a feeling we WILL solve @mattmoor 's use case - since we'll probably have to change TaskRuns to make it happen - but I agree with @vdemeester that that's not the main thing we're targetting here. Now TEP-0044 is more aligned with TEP-0046 than ever!
In the most recent API working group we decided to keep this TEP and [TEP-0046](tektoncd#318) separate because they are coming at a similar problem from different angles. In the WG @jerop suggested that we update the TEPs with some info on what the overlaps + differences are and that's what this TEP is adding!
In today's API working group it was clear that I still haven't been able to articulate what problem this TEP is trying to solve in a way that's clear, so I'm trying this composition vs scheduling approach instead to see if that is more clear. Also @skaegi pointed out that the overhead required in running multiple pods is another consideration in addition to the PVC overhead so I've tried to include this as well. I'm going to follow up with a list of possible solutions so hopefully that will help folks who are confused about what this is trying to address.
This commit adds possible solutions to the problem described in TEP-0044, including references to solutions in other TEPS (46 & 54). I was hoping to merge the problem statement before starting to talk about solutions but it seems like the problem statement is too abstract to get enough traction, and meanwhile folks have been opening more TEPs with related proposals in the meantime, so hopefully starting to list the options here will help us move the discussion forward. I'm hoping we can merge the problem + possible options without needing to decide on which one(s) we want to pursue.
d9e8735
to
a509831
Compare
/lgtm |
🎉 |
This TEP describes some difficulty folks have run into, especially when
trying to use Tekton Pipelines without PipelineResources: if you want to
use the functionality of multiple Tasks, you need to put them together
in a Pipeline and share data between them with volumes. It could be a
useful abstraction, and more efficient, to make it possible to combine
Tasks together without scheduling them on different pods.
Related issues: