Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TEP-0044: Decouple Task Composition from Scheduling ➰ #316

Merged
merged 5 commits into from
Mar 16, 2021

Conversation

bobcatfish
Copy link
Contributor

This TEP describes some difficulty folks have run into, especially when
trying to use Tekton Pipelines without PipelineResources: if you want to
use the functionality of multiple Tasks, you need to put them together
in a Pipeline and share data between them with volumes. It could be a
useful abstraction, and more efficient, to make it possible to combine
Tasks together without scheduling them on different pods.

Related issues:

@bobcatfish bobcatfish added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jan 22, 2021
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 22, 2021
@jlpettersson
Copy link
Member

This looks somewhat overlapping with tektoncd/pipeline#3638 even though that specific use case is listed as a non-goal.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Tekton's current pain-points around storage / data sharing could be really greatly improved and expanded on with task composition.

👍

/lgtm

that can be shared between them.

Some problems with this:
* PVCs add additional overhead, both in speed and in management (somewhat
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate but similar problem (and possibly an additional use-case): an organization may have already decided on cloud buckets as their means of data sharing. Then Tekton comes along "requiring" k8s volumes and so the org is unable or unwilling to adopt Tekton.

I've seen this reported in Slack as a downside of Workspaces though unfortunately don't have a Github issue / user report to point at.

Composing a bucket-download Task with a build Task with a bucket-upload Task would go some way to solving those users' needs I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay interesting! ill try to add a use case for this - let me know if you have any ideas how to expand it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: buckets -- that's interesting. As an operator I can confirm that allocating the sort of short-lived PVCs typically needed by a pipeline is a real pain. A Task that uploads/downloads to a bucket as a pre/post Task might be useful to eliminate the need for a Pipeline PVC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: #290 (by @sbwsg)

Object storage would have a few advantages compared to PVC:

  • zero provisioning time
  • it only uses the space needed, so if pipeline run is not deleted right-away, there's no unused allocated storage left around
  • it's easier to access. It would be possible to even integrate into the CLI or dashboard the ability to list / view the content of the bucket, which would be a great help for troubleshooting :)

It might be slower performance-wise, it depends there the object storage is located too.
Operator could ship a preconfigured minio as optional component, or we could have an installation with minio guide.

We could decide to include this if we have some use cases that need it; for now avoiding
this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.)
or even worse, recursion (Task 1 uses Task 2 uses Task 1...)
- One way to support this later could be via
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're missing a bit here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... i dont know what the rest of this sentence was 🤭 😅 ill just delete it for now

@tekton-robot tekton-robot assigned ghost Jan 25, 2021
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2021
@bobcatfish
Copy link
Contributor Author

/assign sbwsg
/assign vdemeester
/assign skaegi

creation-date: '2021-01-22'
last-updated: '2021-01-22'
authors:
- '@bobcatfish'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skaegi would like to work on this as well

overhead of using a Pipeline, they just want to make a TaskRun,
e.g. [@mattmoor's feedback on PipelineResources and the Pipeline beta](https://twitter.com/mattomata/status/1251378751515922432))
where he wants to checkout some code and [use the kaniko task](https://github.com/tektoncd/catalog/tree/master/task/kaniko/0.1)
without having to fiddle with volumes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skaegi if you have other use case plz add :D (anyone else too!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case is closer to what knative/build use-case was : a simple build composed of few steps that run in sequence.

Tbh, This is a really valid use-case and it would be interesting to see how much it needs to be supported. To make a parallel with our prow CI setup, this is basically what we would need to replicate the current behavior (one check == one job => would be equal one taskrun for simplicity)


### Non-Goals

- Composing Tasks within Tasks at [Task authoring time](https://github.com/tektoncd/community/blob/master/design-principles.md#reusability).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.)
or even worse, recursion (Task 1 uses Task 2 uses Task 1...)
- One way to support this later could be via
- Supporting this composition at runtime in a PipelineRun (not quite sure what that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on how it is implemented - it may be possible to use pipelineSpec - but I agree this should not be a direct goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after @jlpettersson 's comments I think I'll remove this as a non-goal, I think I was jumping to a conclusion too early

Comment on lines 117 to 152
- All Tasks should run even if one fails
- This is to support use cases such as uploading test results, even if the test
Task failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. I think the use case is fine, but "all tasks should run even if one fails" feels to me like a solution rather than a requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I wonder how I can word this - I'm trying to express that in the unit test use case, if the test task fails, you want to still upload results. if we used "pipeline in a pod" as the solution to this, we could accomplish that with finally.

i'm gonna reword this as "It should be possible to have Tasks that run even if others fail" (starting to overlap with the exit code tep!! 😅 ). lemme know what you think!

- '@bobcatfish'
---

# TEP-0044: Composing Tasks with Tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for staring this 🙏

I agree that we need a way to compose tasks that is more efficient than what we have today.
I feel that the problem is not on the authoring side though, but on the runtime.

Today we compose Tasks using Pipeline, which gives us a rich model (perhaps a bit verbose) to connect tasks through params, workspaces, results and more. In itself this model allows us to write highly re-usable tasks that can be combined into complex pipelines. The API is beta, and it has a reasonable level of adoption.

On the runtime side, when we execute a pipeline against a k8s cluster, we force each task to create a new Pod, which means that today - if we use catalog tasks - we need two pods and a pvc to do something like cloning a repo and building a container image. Some very common use cases like this were covered by pipeline resources; removing them from the pictures highlights the need for alternative runtime models.

To summarise, I think the solution to the problem highlighted in this TEP will not be a new way of composing tasks in YAML, but a new way to execute them on a k8s cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree that it might be more a runtime problem than an authoring problem, I feel there is room for both.

To summarise, I think the solution to the problem highlighted in this TEP will not be a new way of composing tasks in YAML, but a new way to execute them on a k8s cluster.

This would need to be very flexible as I think we may want to execute group of tasks in the same pod and other in different ones (and on different nodes too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like folks might want to express this kind fo grouping at a Pipeline level. (In my mind Pipeline authoring straddles both "authoring time" and "runtime" cuz in some ways it's runtime configuration for TaskRuns.)

I.e. you might want to create a Pipeline that groups some Tasks such that they are run on the same pod, and be able to reuse that, without having to express that grouping in every single PipelineRun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"create a Pipeline that groups some Tasks such that they are run on the same pod" -- this is exactly what our teams have been asking us for. Their issue is that if we write fine-grained tasks to promote re-use in our task catalogs we incur a roughly 10 or so second overhead when running a sequence of Tasks. @afrittoli's point about doing the task/pod aggregation automatically is definitely interesting but I would be delighted to be able to do this manually as a first step.

volume based workspaces to share data between them in a Pipeline. e.g. specifically
cloning with [git-clone](https://github.com/tektoncd/catalog/tree/master/task/git-clone/0.2),
running tests with [golang-test](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1)
and uploading results with [gcs-upload](https://github.com/tektoncd/catalog/tree/master/task/gcs-upload/0.1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to play devil's advocate 😈 , each of these catalog tasks git-clone, golang-test, are gcs-upload contains one single step which is clearly a bottleneck here. A task having more than one step makes it more efficient (one pod with multiple containers). Why not combining these tasks into one single task and encouraging users to design efficient tasks? And TEP-0040 can come to rescue here with the step failure i.e. all steps should run even if one fails. If needed, we can implement scheduler at the step level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the problem. git-clone, golang-test and gcs-upload may make sense to be defined as one "bigger" task for sure. But what if I want to run golang-test and gcs-upload from my local directory (packaged as an image and extracted in a volume somehow before the pipeline/task execution) ? Do I need to redefine another task and thus duplicate work, and most importantly duplicate maintenance on this or do I want to re-use community / team / … work and just refer to those definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another detail is that although there is just one step in each of them, in each case that step is running a script which is doing several things. If we wanted to integrate these into one Task, we'd need to copy paste that script around.

We have had the idea of reusable steps come up before: tektoncd/pipeline#1260 I'm not sure if that would help here though b/c you'd probably want the step to be able to declare params/workspaces/results and I think it would end up looking like a Task with a different name (tho you'd be able to combine the steps together?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the big-one for us... when we create a pod we also have to create a secure container runtime to host it in a new light-weight Kata Containers VM. Grouping Tasks also provides some logical semantic relief in addition to performance benefits. We currently are running a sequence of 50 Tasks and suspect even if we could group all 50 into a single pod there might still be semantic value in running this is a 5 coarse grained tasks instead.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobcatfish thanks for starting this TEP. This is honestly a tricky subject and I feel both this TEP and #318 might be too much "opiniated" on the solution — but I feel it might be because of the way we tend to write TEPs.

So, I would like for us to define clearly what is (or are) the problem(s) that we are trying to solve — and hopefully, we shouldn't try to solve too many problems at once. For me this also mean, we may not want to think or compare our solution to PipelineResource yet.

So far, we are using Task as the smallest composable piece, and we use Pipeline to compose them. I tend to agree with @afrittoli on the following.

Today we compose Tasks using Pipeline, which gives us a rich model (perhaps a bit verbose) to connect tasks through params, workspaces, results and more. In itself this model allows us to write highly re-usable tasks that can be combined into complex pipelines. The API is beta, and it has a reasonable level of adoption.

What we are trying to solve in this TEP:

  • Running a Pipeline without the PVC / cloud-provider volumes / complex thingy to setup overhead
  • Running a Task by itself (?) that would be composed of multiple tasks — probably not, this is what we are calling a Pipeline already

If this is the only problem we are trying to solve, then, I also tend to agree with @afrittoli that this is probably more about how we can/decide to run Pipeline than compos-ability on the Task level.

In addition, if we bring compos-ability at the Task level:

  • It might be confusing for the authors of task / users on where should I compose things, at the Task level or at the Pipeline level ?
  • It brings some author overhead
    • how to connect workspaces from one task to another ? do we connect implicitly, do we require the same name, do we allow some "mapping" like we do in Pipeline already ?
    • how to connect results and params from one task to another ? Similar to workspaces.
  • It brings some complexity, such as one described in the requirement part : when should we execute or not the composed task.

As @afrittoli, "The API is beta, and it has a reasonable level of adoption" and I also feel we may not want to offer too much different way to compose thing. I feel focusing on the compos-ability part with Pipeline and Task and look into how we can make it easier to run a Pipeline without overhead (by grouping tasks, running the whole pipeline as one, …) might be a better take — it would also be, I think, opening some possibilities for different implementation on the "runtime" part (aka run a tekton pipeline as a GitHub action, …)

- '@bobcatfish'
---

# TEP-0044: Composing Tasks with Tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree that it might be more a runtime problem than an authoring problem, I feel there is room for both.

To summarise, I think the solution to the problem highlighted in this TEP will not be a new way of composing tasks in YAML, but a new way to execute them on a k8s cluster.

This would need to be very flexible as I think we may want to execute group of tasks in the same pod and other in different ones (and on different nodes too)

volume based workspaces to share data between them in a Pipeline. e.g. specifically
cloning with [git-clone](https://github.com/tektoncd/catalog/tree/master/task/git-clone/0.2),
running tests with [golang-test](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1)
and uploading results with [gcs-upload](https://github.com/tektoncd/catalog/tree/master/task/gcs-upload/0.1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the problem. git-clone, golang-test and gcs-upload may make sense to be defined as one "bigger" task for sure. But what if I want to run golang-test and gcs-upload from my local directory (packaged as an image and extracted in a volume somehow before the pipeline/task execution) ? Do I need to redefine another task and thus duplicate work, and most importantly duplicate maintenance on this or do I want to re-use community / team / … work and just refer to those definitions.

overhead of using a Pipeline, they just want to make a TaskRun,
e.g. [@mattmoor's feedback on PipelineResources and the Pipeline beta](https://twitter.com/mattomata/status/1251378751515922432))
where he wants to checkout some code and [use the kaniko task](https://github.com/tektoncd/catalog/tree/master/task/kaniko/0.1)
without having to fiddle with volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case is closer to what knative/build use-case was : a simple build composed of few steps that run in sequence.

Tbh, This is a really valid use-case and it would be interesting to see how much it needs to be supported. To make a parallel with our prow CI setup, this is basically what we would need to replicate the current behavior (one check == one job => would be equal one taskrun for simplicity)

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2021
Copy link
Contributor Author

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks somewhat overlapping with tektoncd/pipeline#3638 even though that specific use case is listed as a non-goal.

@jlpettersson i think we could still go that route for the solution - I think what I was trying to say in the non-goals is that I thought folks might want to specify their preferences at Pipeline authoring time, vs. PipelineRun authoring time, specifically referring to Tasks to run together at Runtime. I've updated the TEP to try to make this a bit more clear but now I'm wondering if I should just remove the non-goal 🤔

it feels like "PipelineRun in a Pod" is a potential solution for the problem statement here - i haven't reviewed #318 yet but I wonder if it would make sense to combine into one TEP

- '@bobcatfish'
---

# TEP-0044: Composing Tasks with Tasks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like folks might want to express this kind fo grouping at a Pipeline level. (In my mind Pipeline authoring straddles both "authoring time" and "runtime" cuz in some ways it's runtime configuration for TaskRuns.)

I.e. you might want to create a Pipeline that groups some Tasks such that they are run on the same pod, and be able to reuse that, without having to express that grouping in every single PipelineRun

that can be shared between them.

Some problems with this:
* PVCs add additional overhead, both in speed and in management (somewhat
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay interesting! ill try to add a use case for this - let me know if you have any ideas how to expand it

We could decide to include this if we have some use cases that need it; for now avoiding
this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.)
or even worse, recursion (Task 1 uses Task 2 uses Task 1...)
- One way to support this later could be via
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... i dont know what the rest of this sentence was 🤭 😅 ill just delete it for now

this allows us to avoid many layers of nesting (i.e. Task1 uses Task2 uses Task3, etc.)
or even worse, recursion (Task 1 uses Task 2 uses Task 1...)
- One way to support this later could be via
- Supporting this composition at runtime in a PipelineRun (not quite sure what that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after @jlpettersson 's comments I think I'll remove this as a non-goal, I think I was jumping to a conclusion too early

volume based workspaces to share data between them in a Pipeline. e.g. specifically
cloning with [git-clone](https://github.com/tektoncd/catalog/tree/master/task/git-clone/0.2),
running tests with [golang-test](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1)
and uploading results with [gcs-upload](https://github.com/tektoncd/catalog/tree/master/task/gcs-upload/0.1).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another detail is that although there is just one step in each of them, in each case that step is running a script which is doing several things. If we wanted to integrate these into one Task, we'd need to copy paste that script around.

We have had the idea of reusable steps come up before: tektoncd/pipeline#1260 I'm not sure if that would help here though b/c you'd probably want the step to be able to declare params/workspaces/results and I think it would end up looking like a Task with a different name (tho you'd be able to combine the steps together?)

Comment on lines 117 to 152
- All Tasks should run even if one fails
- This is to support use cases such as uploading test results, even if the test
Task failed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I wonder how I can word this - I'm trying to express that in the unit test use case, if the test task fails, you want to still upload results. if we used "pipeline in a pod" as the solution to this, we could accomplish that with finally.

i'm gonna reword this as "It should be possible to have Tasks that run even if others fail" (starting to overlap with the exit code tep!! 😅 ). lemme know what you think!

@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2021
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2021
- name: source-code
workspace: source-code
finally:
# since we merged these concepts, any Foobar can have a finally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at which level(s) would whenexpressions fit in this example? would it guard Foobar, each of the foobars or steps? (recently saw a user asking to guard steps with whenexpressions)

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work proposing all alternative and gathering a lot of related work together!
Do you envision merging this as is for now, and select specific options in a follow-up, or would you prefer to discuss on this PR and alter it before it's merged?
I vote for the former :)
/approve


## Summary

This TEP addresses how the current concept of a Task embodies both a reusable unit of logic but is also a unit used to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I like the new title / summary for this TEP!! ❤️


### Goals

- Make it possible to combine Tasks together so that you can run multiple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+💯

I think one goal of Tekton should be to provide a good opinionated scheduling default that works for simple cases, and let users override it at pipeline authoring time to satisfy special requirements.

My line of thought here is that I wouldn't want for force authors (and DSLs?) to think about the scheduling issue unless they need to. If I write a clone / build / push pipeline, it should be easy to do that in a way that doesn't create unnecessary overhead, without having to make scheduling specific decisions in the pipeline definition.

If we don't want to take an opinion in the controller logic, an alternate way to achieve that could be through the catalog. What used to be git resource + build-and-push task + image resource could be a pipeline with three tasks, tailored to be efficient, available in the catalog. We would need then to have a strong composability story to back this, i.e. move the pipelines-in-pipelines experiment into the core API, so that one might consume pipelines from the catalog as if they were tasks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one goal of Tekton should be to provide a good opinionated scheduling default that works for simple cases, and let users override it at pipeline authoring time to satisfy special requirements.

I tend to agree with that statement. The thing I am not sure we all share the same point of view is : the current design is works well for simple cases. It has an overhead, that's all. Having a way to reduce this overhead is needed, I think it's better it has to be explicit for the user.

* Only helps us with some scheduling problems (e.g. doesn't help with parallel tasks or finally task execution)
* What if you _don't_ want the last Tasks to run if the previous tasks fail?
* If you want some other Task to run after these, you'll still need a workspace/volume + separate pod
* What if you want more flexibility than just before and after? (e.g. you want to completely control the ordering)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is targeted to a very specific set of use cases, so to do something else we'd need a different solution - but still might still be valuable anyways? The syntax is nice and compact, as we could decide that it's the way it is, anything beyond this requires a PVC or a different solution...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree! im thinking that something like this as an experimental custom task might at least be a good step forward

* Uses concepts we already have in Tekton but upgrades them

Cons:
* Not clear what the idea of a PipelineResource is really giving us if it's just a wrapper for Tasks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


Cons:
* Will need to update our entrypoint logic to allow for steps running in parallel
* Doesn't give as much flexibility as being explicit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had a way to set composition / scheduling rules, this could be a default behaviour when no rules are set.
The rules could even be added by a defaulting webhook, so that it become clear what decision the system has made for you. That would combine ease of use (no syntax change in many cases) with control and transparency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! ive added some notes about this as a possible tweak - in general i think we could decide to combine a few of these together if it works well

claimName: mypvc
```

Running with this PipelineRun would cause all of the Tasks to be run in one pod:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to guard / validate the use of this, as doing this on a large pipeline might cause scheduling issues, unless we limit the amount of resources we request for the pod and force serialise the execution of some of the tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk ill add this as a con!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, there is similar scheduling problems with the Affinity Assistant solution, e.g. where the placeholder pod is located on a node that does not have enough resources to host the TaskRun-pods. Having all containers for a PipelineRun with an emptyDir workspace in a single unit (Pod) - "delegates" the scheduling problem to kubernetes scheduler and also provide all scheduling info upfront, whereas today it is more in pieces.

* Doesn't change anything about our API

Cons:
* If this custom task is widely adopted, could fork our user community
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I think this could be a path to explore some of the options before we implement them in the core Tekton.


We could decide if we only support sequential execution, or support an entire DAG. Maybe even finally?

Cons:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pros at all?

A task group CRD could be used do define a number of different execution strategies:

  • sequential only
  • switch (only one based on input)
  • race (finish as soon as at least one is finished)

This is a different kind of problem, but different strategies may fit better with specific scheduling approaches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ive added this as a pro - kinda sounds to me like you're describing custom tasks in general tho? (e.g. you can have a switch statement custom task)

and felt it added more complexity without much gain (see also https://github.com/tektoncd/pipeline/issues/3052)
* Doesn't help us with the overhead of multiple pods (each Task would still be a pod)

### Support other ways to share data (e.g. buckets)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related work: #290


[This is something we support for "linking" PipelineResources.](https://github.com/tektoncd/pipeline/blob/master/docs/install.md#configuring-pipelineresource-storage)

Cons:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pros 😓
Even if it doesn't help for this specific problem, it would be interesting to have this feature :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha ive tried to add one - what really happened here is that i sort of ran out of steam fleshing these out 😅

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2021
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but overall I really like how the TEP is written now 😻
I think it can be ready to merge as soon as some typos are fixed 😉

Comment on lines 40 to 41
[result](https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#using-results), it needs to
you'll need to provision a PVC or do some other similar, cloud specific storage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to should be removed I think

Tasks together and have control over the scheduling overhead (i.e. pods and volumes required) at authoring time
in a way that can be reused (e.g. in a Pipeline)
- Add some of [the features we don't have without PipelineResources](https://docs.google.com/document/d/1KpVyWi-etX00J3hIz_9HlbaNNEyuzP6S986Wjhl3ZnA/edit#)
to Tekton Pipelines (without requiring use of PipelineResources), specifically **Task adapters/specialization**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "Task adapters/specialization" in the TEP ? (I feel the term is used there without prior definition or example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is trying to refer to the content of the linked doc:

image

ill reword this and try to make it more clear


### Goals

- Make it possible to combine Tasks together so that you can run multiple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one goal of Tekton should be to provide a good opinionated scheduling default that works for simple cases, and let users override it at pipeline authoring time to satisfy special requirements.

I tend to agree with that statement. The thing I am not sure we all share the same point of view is : the current design is works well for simple cases. It has an overhead, that's all. Having a way to reduce this overhead is needed, I think it's better it has to be explicit for the user.


## Design details

TBD - currently focusing on enumerating and examining alternatives before selecting one or more ways forward.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


Cons:
* Only helps us with some scheduling problems (e.g. doesn't help with parallel tasks or finally task execution)
* What if you _don't_ want the last Tasks to run if the previous tasks fail?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how would these play with when guard ? It kinda fix the "I want to run this Task, no matter the previous task status", but not fully though.

Cons:
* Only cluster administrators will be able to control this scheduling, there will be no runtime or authoring time
flexibility
* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda true for some other approach 😅

Cons:
* Can create confusing chains of nested Tasks (Task A can contain Task B which can Contain Task C...)
* Requires creating new Tasks to leverage the reuse (maybe embedded specs negate this?)
* Doesn't help with parallel use cases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to TEP-0046 right ? I feel most of the approach running multiple task in a Pod, will come with either a big refactoring of what the entrypoint does (if we keep parallel support) or a trade-off, which is "no parallel execution".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel Steps was the very first idea to address the parallelism-problem initially in may 2020. That would also require entrypoint work. tektoncd/pipeline#2586

(@ImJasonH has some ideas here that are much more aligned with generics in other langs, he might want to update the
example here.)

Pros:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings us closer to GitHub Actions 😝

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure whether to add this as a pro or a con XD


Cons:
* Pretty dramatic API change
* Foobars can contain Foobars can contain Foobars can contain Foobars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. On the other hand, we could experiment with it almost completely independently of the current CRDs (Pipeline, Task) 👼🏼. Different api name, group and version, maybe even an experimental project to start with and validate the concept.

Pros:
* Solves one specific problem: getting data on and off of workspaces

Cons:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another con might be another kind of overhead (push/pull at each task, … might slow the pipeline quite a bit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting! i assumed this would be just once for the entire pipeline, but i see what you mean, it could be per task using the workspace - ill add a note about this


TBD - currently focusing on enumerating and examining alternatives before selecting one or more ways forward.

## Alternatives
Copy link
Member

@jlpettersson jlpettersson Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see many alternatives!

But if this part now is "shared" with TEP-0046(?) then I think it should be pointed out that only a few of the alternatives solves the problem statement in TEP-0046 that is about Task parallelism, primarily.

Also a TaskRunGroup (or TaskGroupRun?) CRD might be a viable addition for some of the alternatives - since some are written from the pipeline authoring view, and some from the runtime view. Note this is different from a TaskGroup CRD that is targeting authors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I think it should be pointed out that only a few of the alternatives solves the problem statement in TEP-0046 that is about Task parallelism, primarily

I think this is why it probably still makes sense to have 2 TEPs - tho i think in the long run even the solution for this TEP will involve supporting Tasks in parallel on the same pod

Also a TaskRunGroup (or TaskGroupRun?) CRD might be a viable addition for some of the alternatives - since some are written from the pipeline authoring view, and some from the runtime view. Note this is different from a TaskGroup CRD that is targeting authors.

I've tried to add some details about this under "Create a new Grouping CRD" but I'm not totally clear on what this would look like - would it basically contain an entire pipeline definition? or link to a pipeline ref?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With TaskGroupRun CRD here, I didn't mean a new CRD for Pipeline- or Task authors, more a run-CRD in case the solution should be open for using more than one Pod (here I am assuming an alternative to TaskRun that I assume is strictly for one-to-one relation between a Task and a Pod, whereas TaskGroupRun is a thought about a many-to-one relation between Tasks and a Pod - and there could be one or multiple of TaskGroupRun related to PipelineRun).

Just thinking out loud here - no deeper thoughts.


#### Controller level

This option is [TEP-0046](https://github.com/tektoncd/community/pull/318). In this option, the Tekton controller can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes and no. This is suggested as an initial configuration in TEP-0046 because, as listed in Cons-section here, it addresses a problem that seem to require a more complex solution - Task parallelism - and therefore probably need to be iterated on multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlpettersson can you elaborate a bit more? it sounds like you're saying a controller wide solution isnt necessarily the goal of TEP-0046? (or TEP-0046 is trying to say that a controller wide flag is a step toward a more complex solution?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of TEP-0046 is to solve the Task parallelism-problem, that has been discussed back and forth since may 2020 and we later added the affinity assistant. The TEP links to a Design doc for the whole Task parallelism problem including alternative solutions (including affinity assistant and custom scheduler and pod-internal workspace).

As has been noted, to solve that problem in the direction as TEP-0046 suggests, require a lot of changes, e.g. rethinking entrypoint and also propbably also the *run-CRDs (e.g. PipelineRun, TaskRun and perhaps TaskGroupRun) and the use of statuses and so on. This problem most likely needs to be solved in many iteration where we learn more after each, and the initial proposal is to do this behind a feature-flag or "Controller level flag" at least until we know more.

Cons:
* Only cluster administrators will be able to control this scheduling, there will be no runtime or authoring time
flexibility
* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is not only a Con for this "alternative".

What is written here is probably what is needed for properly addressing the problem written in TEP-0046, about Task parallelism that is a complex problem that has been discussed since april/may 2020 and where we have added the affinity assistant but that solution has shortcomings that could be addressed with a Pod-internal solution. Especially the affinity assistant solution can not mitigate the problem with autoscaling or properly scheduling to nodes with enough resources for the pipeline, as noted in tektoncd/pipeline#3049

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ive tried to add a "cons" section at the top of the alternative section mentioning that this applies to multiple solutions


Cons:
* [@jlpetterson has explored this option](https://docs.google.com/document/d/1lIqFP1c3apFwCPEqO0Bq9j-XCDH5XRmq8EhYV_BPe9Y/edit#heading=h.18c0pv2k7d1a)
and felt it added more complexity without much gain (see also https://github.com/tektoncd/pipeline/issues/3052)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in the end, a custom scheduler does not take us any further with some of the fundamental problems we have with the affinity assistant e.g. respecting autoscaling and mitigate problems with nodes that have too little resources to host the pipelineRun, e.g. tektoncd/pipeline#3049

* If it's important for a Pipeline to be executed in a certain way, that information will have to be encoded somewhere
other than the Pipeline

#### PipelineRun: flag
Copy link
Member

@jlpettersson jlpettersson Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one could also be for TEP-0046, when implementation is more mature, to move from "Controller level config" (aka feature-flag?)

flexibility
* Executing a pipeline in a pod will require significantly re-architecting our graph logic so it can execute outside
the controller and has a lot of gotchas we'll need to iron out (see
[https://hackmd.io/@vdemeester/SkPFtAQXd](https://hackmd.io/@vdemeester/SkPFtAQXd) for some more brainstorming)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice post @vdemeester !!
I added a comment :)

be configured to always execute Pipelines inside one pod.

Pros:
* Making it possible to execute a Pipeline in a pod will also pave the way to be able to support use cases such as
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a pro of multiple possible solutions described here - quite a few of them require that a pipeline will need to be fully runnable as a pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, ill add this at the top of the alternatives section as well

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, sbwsg, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 11, 2021
@bobcatfish
Copy link
Contributor Author

Thanks for all the feedback @afrittoli @jlpettersson @sbwsg @vdemeester ! I've tried to address all of the comments (let me know what i've missed!)

@afrittoli: Do you envision merging this as is for now, and select specific options in a follow-up, or would you prefer to discuss on this PR and alter it before it's merged?

I'm hoping we could merge this with the alternatives listed and then keep discussing and iterate from there! 🙏

(if folks are happy to merge this as-is I can keep addressing any more comments that come in via the next PR)

@ghost
Copy link

ghost commented Mar 15, 2021

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 15, 2021
This TEP describes some difficulty folks have run into, especially when
trying to use Tekton Pipelines without PipelineResources: if you want to
use the functionality of multiple Tasks, you need to put them together
in a Pipeline and share data between them with volumes. It could be a
useful abstraction, and more efficient, to make it possible to combine
Tasks together without scheudling them on different pods.
In the review @vdemeester pointed out that it feels like we are trying
to deal with 2 different things - for me the main reason for that is b/c
I have a feeling we WILL solve @mattmoor 's use case - since we'll
probably have to change TaskRuns to make it happen - but I agree with
@vdemeester that that's not the main thing we're targetting here.

Now TEP-0044 is more aligned with TEP-0046 than ever!
In the most recent API working group we decided to keep this TEP and
[TEP-0046](tektoncd#318) separate
because they are coming at a similar problem from different angles.

In the WG @jerop suggested that we update the TEPs with some info on
what the overlaps + differences are and that's what this TEP is adding!
In today's API working group it was clear that I still haven't been able
to articulate what problem this TEP is trying to solve in a way that's
clear, so I'm trying this composition vs scheduling approach instead to
see if that is more clear. Also @skaegi pointed out that the overhead
required in running multiple pods is another consideration in addition
to the PVC overhead so I've tried to include this as well.

I'm going to follow up with a list of possible solutions so hopefully
that will help folks who are confused about what this is trying to
address.
This commit adds possible solutions to the problem described in
TEP-0044, including references to solutions in other TEPS (46 & 54).
I was hoping to merge the problem statement before starting to talk
about solutions but it seems like the problem statement is too abstract
to get enough traction, and meanwhile folks have been opening more TEPs
with related proposals in the meantime, so hopefully starting to list
the options here will help us move the discussion forward.

I'm hoping we can merge the problem + possible options without needing
to decide on which one(s) we want to pursue.
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2021
@ghost
Copy link

ghost commented Mar 16, 2021

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2021
@tekton-robot tekton-robot merged commit 0d1fea4 into tektoncd:main Mar 16, 2021
@ghost
Copy link

ghost commented Mar 16, 2021

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants