-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEP-0140] Produce Results from Matrixed PipelineTasks #1047
Conversation
/assign @lbernick |
/assign |
- $(tasks.matrix-emitting-results.results.report-url[0]) # linux-chrome | ||
- $(tasks.matrix-emitting-results.results.report-url[3]) # linux-safari | ||
- $(tasks.matrix-emitting-results.results.report-url[6]) # linux-firefox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for dynamically fanning out combinations based on params, but then only using some of those combinations in later tasks? Would matrix include be better suited to these use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax here would only work if the param names are static right? If we added a new parameter e.g. android
to the list then the task would produce unexpected results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for dynamically fanning out combinations based on params, but then only using some of those combinations in later tasks?
One of them is documented under Platforms and Browsers section.
In addition to this, there are many common CI/CD use cases with this kind of requirement in the industry. In microservices world that we are in today, most of the applications comprised of multiple repos, e.g. (1) Source Code (2) Configurations. To bless an application deployment in production, all the repos has to be built using the same build task
which might be producing a digest
as a task result. When it comes to deployment task, a digest from building a source repo is needed but not the configuration repo.
There is an additional use case which we discussed this morning of running the same scan
task for a given GitHub repo and retrieving only a scan report from the instance which ran scan with X
to continue execution. The rest of the results might be consumed by the rest of the tasks in a pipeline.
None of these use cases are coming from a single vendor.
Would matrix include be better suited to these use cases?
Matrix include is designed to specify a combination of params
for a given task
. This particular sample pipeline if we re-write using include
, it will result in a very complex and verbose definition. We will have to spell out all possible combinations of params which were expected to be generated by the controller. And in turn, it defeats the purpose of introducing matrix
. Include is a very useful feature for the use cases documented in the TEP.
The syntax here would only work if the param names are static right?
The syntax here is ideal when the param values are static. Its inherent nature of the params
which allows pipeline
and task
definitions to provide static
values. We are taking advantage of such static values in our own CI.
If we added a new parameter e.g.
android
to the list then the task would produce unexpected results?
The pipeline author when updates the static value of a param
, he/she is generally aware of its dependent tasks which are consuming the results. The task would not produce any unexpected results, the pipeline author will have to re-map the indices to match with her/his expectation.
Indexing into an array param/results is not something new being introduced by this TEP. Indexing into an array results is already a beta feature - https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#passing-one-tasks-results-into-the-parameters-or-when-expressions-of-another
All the concerns we have here also applies to the pipeline which consumes array results by indexing into an array result or to the pipeline which consumes array params by indexing into the param from a pipelineRun
(without matrix).
This example will result in the same failure or will question the underlying task definition if the order of environments
changes from [ 'staging', 'qa', 'prod']
to [ 'qa', 'prod', 'staging' ]
in the pipelineRun
.
I would like to continue addressing the rest of the comments if we all align on the use case and reach a consensus on this feature request such that @EmmaMunley can focus on updating the TEP if needed.
@lbernick @dibyom @vdemeester @afrittoli thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of them is documented under Platforms and Browsers section.
To be honest, I'm not sure why someone would want to run tests on different combinations of browsers and platforms, but then only fetch some of the test results. Is there user feedback reflecting this use case?
To bless an application deployment in production, all the repos has to be built using the same
build task
which might be producing adigest
as a task result. When it comes to deployment task, a digest from building a source repo is needed but not the configuration repo.
I'm a little bit confused about why you'd want to matrix a single build task for a source repo and a configuration repo. For example, let's say you're using kaniko build for your source repo. Does your configuration repo also contain a Dockerfile that should be built in parallel with the source repo? It seems more likely to me that a configuration repo would contain e.g. infrastructure as code and not a binary that should be built in parallel in the same way as the source repo.
There is an additional use case which we discussed this morning of running the same
scan
task for a given GitHub repo and retrieving only a scan report from the instance which ran scan withX
to continue execution. The rest of the results might be consumed by the rest of the tasks in a pipeline.
Can you elaborate a bit more on why the rest of the pipeline is gated on only one of the scan tasks in this use case? @EmmaMunley it would be helpful to add this use case to the TEP proposal. It may also be helpful to add an example pipeline based on this use case, so we can show what the example would look like with different syntax options.
Would matrix include be better suited to these use cases?
Matrix include is designed to specify a combination of
params
for a giventask
. This particular sample pipeline if we re-write usinginclude
, it will result in a very complex and verbose definition. We will have to spell out all possible combinations of params which were expected to be generated by the controller.
@EmmaMunley would you be able to write out your example of selecting only certain matrixed results using include instead of indexing so we can compare the syntax in each case?
Indexing into an array param/results is not something new being introduced by this TEP... This example will result in the same failure or will question the underlying task definition if the order of
environments
changes from[ 'staging', 'qa', 'prod']
to[ 'qa', 'prod', 'staging' ]
in thepipelineRun
.
Thanks for this! I thought about it a bit more and this actually makes me more concerned about array indexing rather than making me more convinced we should use it here :(
The TEP that introduced array indexing explained the use case in this way: "Without indexing, arrays will only work with arrays. This means you cannot combine arrays with Tasks that only take string parameters and this limits the interoperability between Tasks...Use case for array indexing: Unwinding loops (i.e. using one Task multiple times for some set of items)". Now that we can produce matrixed taskruns from array results, I wonder if array indexing is still the best way of doing this?
I would like to continue addressing the rest of the comments if we all align on the use case and reach a consensus on this feature request such that @EmmaMunley can focus on updating the TEP if needed.
Thanks very much @pritidesai! I would love to see the use cases you've identified here added to the TEP, with any related user feedback if we have it. It sounds like the main goal they have in common is to be able to use a result that was generated with a particular set of parameters; this goal would be helpful to have in the TEP as well, and then we can explore different solutions for achieving this goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not sure why someone would want to run tests on different combinations of browsers and platforms, but then only fetch some of the test results. Is there user feedback reflecting this use case?
It all depends on how they have configured their pipelines.
Its not about only fetch some of the test results. This example could be interpreted as classify and collect the reports for a given platform
before sending it for further processing. There could be a subsequent set of parallel tasks to collect reports based on the platform and send them to their platform specific buckets.
pipelineTask: fetch-and-send-linux-reports
pipelineTask: fetch-and-send-mac-reports
pipelineTask: fetch-and-scan-windows-report
As a platform provider, we provide building blocks and its up to the users how they would like to build their choice of resource.
|
||
### Support String Results produced by a matrix, Aggregated as An Array during Fanning Out - Referencing using indexes | ||
|
||
Since the ordering of a fanned out matrix is predeterministic due to sorting the combinations, we can reference results produced from different TaskRuns using indexing using the syntax `$(tasks.<pipelinetask-name>.results.<result-name>[INDEX])`. This approach will work for any matrix regardless if it has matrix params and/or include params. A con to this approach is that users will need to know in advance how the order of the TaskRuns is determined and figure out what the order will be ahead of time if users need to reference or filter for a specific result. At the pipeline definition time, the parameters are unknown. Another con is that it might become exponentially more challenging as you scale with more combinations, making it less user friendly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about array indexing into matrixed results for a few reasons:
- UX issues: the user needs to run the pipeline to determine the correct indices to use. If they rename or change params, they will need to rewrite their pipeline.
- Pipelines will not work for different input parameters. This conflicts with our reusability design principle (https://github.com/tektoncd/community/blob/main/design-principles.md#reusability) which states that "At run time (i.e. when invoking a Pipeline or Task via PipelineRun or TaskRun), users should be able to control execution as needed by their context without having to modify Tasks and Pipelines".
- We cannot evolve the implementation of matrix. Deterministic ordering of taskruns was not intended to be a behavior users should rely on (see discussion here) and in fact, I think we should allow ordering to be randomized so that users don't come to depend on it. (This is what Go chose to do for the order of iteration over map keys, for a very similar reason: https://go.dev/doc/go1#iteration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I have descoped this TEP to only support emitting and consuming the entire array of results produced by fanned out TaskRuns as this is what everyone currently agrees on and marked this TEP implementable
so I can continue to make progress, while we continue the discussion in the comments section and in the API working group around handling referencing results for a singular or specific combination(s) of params.
@lbernick @pritidesai @dibyom @Yongxuanzhang @vdemeester @afrittoli
### Support only Matrix.Include Results by Result Name | ||
If the only use case we need to support is for explicit combinations, meaning only `Matrix.Include` without `Matrix Params`, then we can support results using the syntax of $(tasks.<include-name>.results.<result-name>) so that we can have an explicit mapping for each combination. | ||
Pros: This is a straightforward approach to map the combinations to the original params. | ||
Cons: This approach won’t support `Matrix` with `Params` or `Matrix` with both `Matrix` `Include` and `Matrix` `Params`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach. Can we expand it to support using full array results fanned out from matrix params, but not array indexing of matrixed results?
|
||
In the example below, a user wants to run tests on a combination of different platforms and browsers and then fetch the reports for only linux platforms. | ||
```yaml | ||
apiVersion: tekton.dev/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following alternative syntax work? This uses pipelines in pipelines.
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: platforms-with-results
spec:
serviceAccountName: "default"
pipelineSpec:
tasks:
- name: matrix-emitting-results
matrix:
params:
- name: platform
value:
- linux
- mac
- windows
- name: browser
value:
- chrome
- safari
- firefox
pipelineSpec:
tasks:
- name: produce-results
params:
- name: platform
default: ""
- name: browser
default: ""
results:
- name: report-url
type: string
steps:
- name: produce-report-url
image: alpine
script: |
echo "Running tests on $(params.platform)-$(params.browser)"
echo -n "https://api.example/get-report/$(params.platform)-$(params.browser)" | tee $(results.report-url.path)
- name: task-consuming-results
when:
- input: $(params.platform)
operator: in
value: ["linux"]
params:
- name: url
value: $(tasks.produce-results.results.report-url)
taskSpec:
params:
- name: url
type: string
steps:
- name: echo
image: alpine
script: |
for arg in "$@"; do
echo "Arg: $arg"
done
- $(tasks.matrix-emitting-results.results.report-url[0]) # linux-chrome | ||
- $(tasks.matrix-emitting-results.results.report-url[3]) # linux-safari | ||
- $(tasks.matrix-emitting-results.results.report-url[6]) # linux-firefox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax here would only work if the param names are static right? If we added a new parameter e.g. android
to the list then the task would produce unexpected results?
params: | ||
- name: url | ||
value: | ||
- $(tasks.matrix-emitting-results.results.report-url[0]) # linux-chrome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spitballing here, if the use case is filter all the results where params.platform == linux
, we could add some new syntax similar to JSONPath filters e.g. $(tasks.matrix-emitting-results.results.report-url?(@params.platform.value=='linux')]
4d3cb3b
to
2e75724
Compare
echo -n "https://api.example/get-report/$(params.platform)-$(params.browser)" | tee $(results.report-url.path) | ||
``` | ||
|
||
## Proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if one of the matrixed Task
fails (prior to write the result) ? Are we producing incomplete results array or no results at all ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event that a Matrixed PipelineTask
that produces Results
fans out into multiple TaskRuns
and one or more of the fanned out TaskRuns
fail, the current behavior of allowing a failed TaskRun
to still produce a Result
would be supported tektoncd/pipeline#6510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vincent for this question, I think it's interesting/important! There are actually a few related questions here IMO:
- If a matrixed taskrun fails before writing a result, will the array result not exist or be length n - 1 or n (i.e. we provide some "zero value" for the result of a taskrun that doesn't produce a result that should) where n is the number of combinations? I think this is what Vincent was asking. IMO it should be length n - 1, and then once tep 50 is implemented, pipeline authors can choose whether to continue operating on the incomplete result even though the pipeline task is failed.
- If a matrixed taskrun fails after writing a result, will the array result exist? I think feat: support to produce results from a failed task pipeline#6510 allows a taskrun to produce a result even if it failed after writing it, so it seems reasonable to me to have an array result populated for a matrixed pipeline task for consistency.
- If a matrixed taskrun is supposed to produce a result, but doesn't, and still succeeds, should the array result not exist or be length n - 1 or n? Reading through TEP 48 I'm not sure exactly what the right call is here but I'd say an incomplete array result makes sense for consistency; the pipeline task will succeed so it wouldn't make sense to have it not produce a result.
In general I don't think it would be a good idea to produce "zero value" results for tasks that don't write the results to ensure the array result is the length of the number of combinations. (Note that this is another reason indexing might be a confusing UX: we'd have to either produce results for tasks that don't write them, or not produce an array result any time a task didn't write a result, even if the task succeeded.)
77981e4
to
b3e2e4e
Compare
9b58427
to
7f6e51e
Compare
|
||
1. Support a matrixed `pipelineTask` that produces `result` of type `array` or `object` that are then consumed by | ||
another `pipelineTask`. | ||
- A matrixed `pipelineTask` can support `result` of any type as long as they aren’t consumed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming they also can't be consumed in pipeline results, but you might be able to consume them with pipelines in pipelines or with a script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these can be consumed by chains
and in future will be helpful in consuming results from an individual instances and could be useful while implementing pipelines in pipelines or with a scripts.
#### Access Aggregated Results Length | ||
|
||
The pipeline authors can access the length of the array of aggregated results that were | ||
actually produced using the syntax: `tasks.<pipelineTaskName>.matrix.<resultName>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find this in our docs, but I thought it was possible to reference a full array result with our without the [*]
at the end? If so, could we use a different syntax for this context variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure either, @Yongxuanzhang do you any insight into such syntax? We can always adapt if such syntax is already supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add [*]
for array results reference.
tep: https://github.com/tektoncd/community/blob/main/teps/0076-array-result-types.md#to-provide-values-for-array-params
doc: https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#substituting-array-parameters
Do we want to support the reference without using [*]
here?
|
||
The pipeline authors can access the length of the array of aggregated results that were | ||
actually produced using the syntax: `tasks.<pipelineTaskName>.matrix.<resultName>` | ||
and `finally.<pipelineTaskName>.matrix.<resultName>`. This will allow users to loop over the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would finally.<pipelineTaskName>.matrix.<resultName>
be used in pipeline results? I don't think it could be used in a subsequent task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes finally
can be used in the pipeline results.
The producer task must have defined a `result` of type `string`, `matrix` aggregates the results from each instance of | ||
the matrixed `pipelineTask` which can be consumed into a `param` or `when` expressions as type `array`. | ||
|
||
#### Missing Task Results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
7f6e51e
to
d0159ed
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lbernick, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d0159ed
to
5460a22
Compare
5460a22
to
21b49ff
Compare
[TEP-0090: Matrix][tep-0090] proposed executing a `PipelineTask` in parallel `TaskRuns` and `Runs` with substitutions from combinations of `Parameters` in a `Matrix`. Specifying `Results` in a `Matrix` was in scope in [TEP-0090][results], and is already supported. However, producing `Results` from `PipelineTasks` with a `Matrix` was out of scope to await [TEP-0075: Object Parameters and Results][tep-0075] and [TEP-0076: Array Results][tep-0076]. This TEP aims to enable producing `Results` from `PipelineTasks` with a `Matrix` so that they can be used in subsequent `PipelineTasks`. [tep-0075]: https://github.com/tektoncd/community/blob/main/teps/0075-object-param-and-result-types.md [tep-0076]: https://github.com/tektoncd/community/blob/main/teps/0076-array-result-types.md [tep-0090]: https://github.com/tektoncd/community/blob/main/teps/0090-matrix.md [tep-0118]: https://github.com/tektoncd/community/blob/main/teps/0118-matrix-with-explicit-combinations-of-parameters.md Signed-off-by: Priti Desai <pdesai@us.ibm.com>
21b49ff
to
8100b7a
Compare
API WG: |
TEP-0090: Matrix proposed executing a
PipelineTask
in parallelTaskRuns
andRuns
with substitutions from combinations ofParameters
in aMatrix
. SpecifyingResults
in aMatrix
was in scope in [TEP-0090][results], and is already supported. However, producingResults
fromPipelineTasks
with aMatrix
was out of scope to awaitTEP-0075: Object Parameters and Results and TEP-0076: Array Results. This TEP aims to enable producing
Results
fromPipelineTasks
with aMatrix
so that they can be used in subsequentPipelineTasks
./kind TEP