TEP-0040 - ignore step error #302

pritidesai · 2021-01-08T18:41:29Z

Proposing a TEP to ignore step error and provide an option to continue after capturing the non zero exit code.

Related issues:

pritidesai · 2021-01-08T18:41:57Z

/kind tep

bobcatfish · 2021-01-11T17:12:24Z

I'd really like to take a look at this before we merge it - I think there's a lot of overlap with the use case listed here and the goals we have for the PipelineResources revamp (specifically around task specialization)

/assign bobcatfish
/hold

afrittoli · 2021-01-11T20:21:50Z

The code of the PoC is at https://github.com/afrittoli/pipeline/commits/tep_0040 (the last two commits)

bobcatfish

I'm a bit confused about the problem(s) we are trying to address here. There are a few related problems and it feels like this is kinda starting to address but it's not totally clear:

Allow steps to fail (Add a field to Step that allows it to ignore failed prior Steps *within the same Task* pipeline#1559) - progress on this one has been stalled since @imjasonh pointed out it's not clear that a single task is the best solution and a pipeline with multiple Tasks might make more sense: Add a field to Step that allows it to ignore failed prior Steps *within the same Task* pipeline#1559 (comment)
"off the shelf" images <-- this seems like an interesting problem on it's own and is the motivation behind TEP 11. id really like to understand some examples of this a bit better
It's interesting to me that we are talking about providing the exit code itself - up until this point, and in proposals like TEP 28 we've discussed failure vs success but never at the level of the error code before. Why is it important to this proposal, or is this just a case of wanting to understand if a step has failed?
I'm a bit confused about what level we want this at: is it important that steps within a Task be allowed to fail and other steps can see their status, or is it important that Tasks within a Pipeline be able to fail and provide their exit code as part of their failure? (some use cases might help clear this up) <-- if the main motivation here is to allow tasks in a pipeline to fail and for something to act on that failure, id like to understand more about why finally Tasks dont do the trick
The use case listed in this TEP (reporting the failure of unit tests) doesn't seem well suited to this solution: a pipeline with a finally task seems like it could do the same job? baking everything into one task interferes with the task's resusability (and is one of the motivations of pipelineresources and task specialization

I think the key thing is I'm trying to understand what this provides that we don't get with finally tasks + TEP 28.

bobcatfish · 2021-01-11T20:35:55Z

teps/0040-capture-step-exit-code.md

+an option to capture non-zero exit code. The pipeline author can choose to continue execution after capturing
+the non-zero exit code and make it available to access it after the execution.
+
+Issue: [tektoncd/pipeline#2800](https://github.com/tektoncd/pipeline/issues/2800)


I think this might also be covering tektoncd/pipeline#1559 ?

Indeed, even though the approach there is slightly different. The focus on that one is to let a step run even though there was a previous failure. Here is on letting the potentially failing step declare that the failure should be captured, so other steps don't need to be aware about the failure at all.

bobcatfish · 2021-01-11T20:38:15Z

teps/0040-capture-step-exit-code.md

+As a pipeline author, I would like to design a task with multiple steps. One of the steps is running an
+enterprise  image to run unit tests, and the next step needs to report test results even after a previous
+step results in failure due to tests failure.
+


Can you go into more detail with the use cases?

One example could be the nightly release pipeline. As part of the release pipeline we might want to run a security scan, unit tests and other checks. None of this checks should stop the release pipeline from executing, however we want to run the tests so that we keep a track of which failures we saw against which version.

This can be sometime achieved by wrapping the step that runs the tests with a script, but in other cases it would require to create a new container image that contains a shell. In cases where wrapping with a script is possible, it may still be undesirable.

This use case is strengthened by the case of users migrating to Tekton. They may have scripts / steps from their previous CI system where it was possible to ignore a failure, and they would have to modify them and/or wrap them to trap the exit code.

From our discussion the other day: the nightly release pipeline example is kind of different b/c that's about allowing an individual Task to fail; this proposal is about allowing steps within Tasks to fail. If we wanted to apply this proposal to our nightly release pipeline for example, we'd have to modify the golang-test task

Having to modify a Task is the main downside of this approach, we should capture it in the TEP.
The use case I'm facing is to migrate users to Tekton, they already have tasks written, and they need to be moved to Tekton, so it is not an issue in this case.

A possible workaround for existing Tasks, they could expose the capture flag through a task parameter, to allow the Task to use in both modes, with and without capture.

"we'd have to modify the golang-test task" ... wow I somehow hadn't considered that. All of our legacy cases are with teams providing an "image" that when run might "fail", but in Tekton where we're promoting a Task Catalog teams might provide a Task that we should equivalently let a Pipeline decide if it should fail. I think maybe we should consider support ignoring Task failure in this TEP or at least immediately create another TEP to look at adding support.

bobcatfish · 2021-01-11T20:38:17Z

teps/0040-capture-step-exit-code.md

+## Motivation
+
+It should be possible to easily use off-the-shelves (OTS) images as steps in Tekton tasks. A task author has no
+control on the image but may desire to capture an error and continue executing the pipeline.


can you provide some examples of images that cannot be used today? or what it would look like to try to use them?

Is the main issue with these off-the-shelf images that they sometimes don't have a shell? this was the motivation for TEP-0011 as well

(if the images have a shell, you can write your own script to do whatever you want with the exit code)

Yes, indeed, the most difficult case is that of images that do not include a shell.
The wrapping script feels to me like a workaround though, I think it would be a nicer user experience if we allowed natively to trap the exit code.

Another detail we could include in the TEP: many projects have to create "debug" versions of their images to support users that want shells; in CD automation especially there are so many scenarios where you do need shells that often folks end up just having to always use the debug version

The wrapping script feels to me like a workaround though

could you explain a bit more? It feels to me like being able to run a script that runs the command that needs to be run gives the user the most flexibility, e.g. they can grab the exit code, they can grab stdout + stderr, they can do simple processing like grep and awk, etc.

Container images are not necessarily written with a wrapping script in mind.
If an image sets an entrypoint, it is usually the recommended way to use the image. To write a script that wraps the container, we need to inspect the container image, find out what the entry point is, and run the command in our script the same way the entry point would have, which is not something we should ask Task author to be doing. If the entrypoint changes, the task will have to be updated too.

We could have a mechanism that fetches the entrypoint from the docker image automatically and runs it... oh, wait, we already do, it's Tekton's entrypoint. This TEP proposes to add a flag in the entrypoint to capture the exit code - I don't think we should do this instead in a shell script that will be executed by entrypoint.

kk - maybe we can discuss this more when we start looking at possible designs

bobcatfish · 2021-01-11T20:38:19Z

teps/0040-capture-step-exit-code.md

+Steps are executed in order in which they are specified. One single step failure results in a task failure
+in turn results in a pipeline failure. Once a step results in failure, rest of the steps are not executed.
+
+Many common pipelines have requirement where a step failure must not halt the entire execution. In order to


im a bit confused about whether this problem exists at the pipeline level or at the step level, is it possible to dig into it a bit more?

The failure originates in a step and propagates to the entire pipeline.
In terms of granularity, the proposal of this TEP is to trap the exit code for specific steps.

I feel like we have two separate scenarios, maybe we want to handle both, but I'm not convinced both apply to the use cases we're discussing:

I want to stop the failure of a Task in a Pipeline from stopping the execution of a Pipeline (and maybe stop it from failing as well)

I want to stop the failure of a step in a Task from failing the Task

The failure originates in a step and propagates to the entire pipeline.

Right, there is a relationship between the two - you could potentially use (2) to give you (1) as well, but (2) doesn't let you have any control over this at the pipeline level, it has to be baked into the Task. I'd argue that to truly satisfy (1) we'd want that control at the pipeline level

I think we might eventually want support for ignoring both "step" and "task" failure.

pritidesai · 2021-01-12T01:40:35Z

Thanks @bobcatfish for thorough review 😻

It's interesting to me that we are talking about providing the exit code itself - up until this point, and in proposals like TEP 28 we've discussed failure vs success but never at the level of the error code before. Why is it important to this proposal, or is this just a case of wanting to understand if a step has failed?

My understanding is taskrun and pipelinerun execution status are of type conditions (implementing duck type) vs step execution states are limited to container states with exitCode and reason.

TaskRun status:

"conditions": [
     {
         "lastTransitionTime": "2021-01-08T22:45:22Z", 
         "message": "All Steps have completed executing",
         "reason": "Succeeded",                                               
         "status": "True",
         "type": "Succeeded"
     }

TaskRun status with a list of steps:

"steps": [
      {
          "container": "execute-this-step",                                                        
          "imageID": "....",
          "terminated": {
              "containerID": "...",
              "exitCode": 0,
              "finishedAt": "2021-01-08T22:45:22Z",
              "message": "[{\"key\":\"sum\",\"value\":\"12\",\"type\":\"TaskRunResult\"}]",
              "reason": "Completed",
              "startedAt": "2021-01-08T22:45:22Z"
          }
      }
  ],

The duck typed conditions are not available at the step level (we could implement them if needed), and therefore exit codes are so important to this proposal. A non zero exit code of a container means a step failed. Failed step with capture exit code set to true allows to ignore that step failure and continue executing rest of the steps and tasks.

pritidesai · 2021-01-12T01:46:20Z

I'm a bit confused about what level we want this at: is it important that steps within a Task be allowed to fail and other steps can see their status, or is it important that Tasks within a Pipeline be able to fail and provide their exit code as part of their failure? (some use cases might help clear this up) <-- if the main motivation here is to allow tasks in a pipeline to fail and for something to act on that failure, id like to understand more about why finally Tasks dont do the trick

It is important that steps within a task be allowed to fail and other steps in the same task can access the exit code. This proposal is limited to step failure and does not propose allowing a task to fail. A step capturing exit code ignores step failure and declares that step as a success (exit with zero) which results in successful task and continues executing rest of the tasks in a pipeline. With this proposal, a task failure cannot be ignored without capturing a step exit code and a single task failure still results in pipeline failure.

bobcatfish · 2021-01-12T21:52:09Z

It is important that steps within a task be allowed to fail and other steps in the same task can access the exit code.

Maybe we can dig into this a bit more. This feels to me like a few related but different features:

Allow a step to fail (Add a field to Step that allows it to ignore failed prior Steps *within the same Task* pipeline#1559) without stopping execution of the next step
In the case where a step is allowed to fail, allow the next step to know if the previous step failed
In addition to letting the next step know if the previous step failed, provide it with the exit code of the previous step

( @afrittoli 's demo seemed to go a bit further than 3 by also exposing the exit code as a result of the Task but it sounds like that might be beyond the scope of this TEP? in the issue tektoncd/pipeline#2800 @afrittoli mentions "It may be desirable for the pipeline to continue even if there was a failure." so I feel like there might be a bit of a mismatch between this TEP and that issue 🤔 )

Before we get into (2) and (3) I'd like to understand a bit better why we need (1). At a glance it makes sense, however once we start to dig into it (e.g. @imjasonh 's comment tektoncd/pipeline#1559 (comment)) it seems like the use cases motivating this are much better suited to a Pipeline with multiple Tasks. If we can identify some use cases that need (1) and why a Pipeline with multiple Tasks doesn't work for them I think that will really help.

afrittoli · 2021-01-13T16:43:09Z

It is important that steps within a task be allowed to fail and other steps in the same task can access the exit code.

Maybe we can dig into this a bit more. This feels to me like a few related but different features:

Allow a step to fail (tektoncd/pipeline#1559) without stopping execution of the next step

...and ultimately, of the whole pipeline, since today a step failure implies failure of the task and of the overall pipeline

In the case where a step is allowed to fail, allow the next step to know if the previous step failed

In fact what this TEP provides is a way to trap or capture the failure, so the step does not actually fail from a Tekton POV.
Let the next step access the previous step success status (or trapped exit code) is not something that I thought would be in scope for this TEP. It is something that we can add later.

In addition to letting the next step know if the previous step failed, provide it with the exit code of the previous step

( @afrittoli 's demo seemed to go a bit further than 3 by also exposing the exit code as a result of the Task but it sounds like that might be beyond the scope of this TEP? in the issue tektoncd/pipeline#2800 @afrittoli mentions "It may be desirable for the pipeline to continue even if there was a failure." so I feel like there might be a bit of a mismatch between this TEP and that issue 🤔 )

The reason I added the original exit code as a result in the PoC is because normally the exit code would be available in the step status. I wanted to provide a way to still make it available even when it's captured. In fact the bit that is strictly in scope for this TEP is to expose whether the step would have failed or not, to allow for external systems to identify this cases and bring them to the attention of users.

Before we get into (2) and (3) I'd like to understand a bit better why we need (1). At a glance it makes sense, however once we start to dig into it (e.g. @imjasonh 's comment tektoncd/pipeline#1559 (comment)) it seems like the use cases motivating this are much better suited to a Pipeline with multiple Tasks. If we can identify some use cases that need (1) and why a Pipeline with multiple Tasks doesn't work for them I think that will really help.

A pipeline with multiple tasks would solve the issue because a task failure would still cause the pipeline to fail.
Capturing the error at step level provides some advantages compared to doing so at task level.

If I want to:

run tests (might fail)
push results and logs

and I can only capture errors at Task level, I need to run the tests in a task and push the logs in a different ones, which means I need to copy logs and results to some pipeline storage, download them from there and push them to the final storage, which means three times as much I/O.

bobcatfish · 2021-01-14T21:33:35Z

Thanks for the extra info @afrittoli !

I need to run the tests in a task and push the logs in a different ones, which means I need to copy logs and results to some pipeline storage, download them from there and push them to the final storage, which means three times as much I/O.

Ah okay interesting! So my questions about this use case are:

I guess we're talking about a scenario where the image that we are using to run the tests doesn't have a shell? If it did have a shell, we'd have control over whether to fail the step or not
Sounds like an important requirement (maybe we can add this to the list) is to be able do all of this in one pod, e.g. something like "run a command that fails / run an image that does not contain a shell and fails, then act on that failure, all within the same pod (without requiring sharing data between pods)"?
How important is it that the overall Pipeline doesn't fail in this case and why is it important?

Another way we could tackle this is via task specialization, i.e. if we provide a way to specify "wrapping" tasks (like in this example where there is a "finally" clause for an individual Task) - this way you could write one Task that runs the test, and another Task that pushes the results and logs, and (theoretically) execute all of that within the same pod. From the requirements you described, the only missing bit is whether the failure of the Task fails the entire Pipeline or not.

(Totally understand if you don't agree with that route to tackle this, but I want to make sure we really understand what problem we're solving exactly - it kinda sounds like the main problem is: once any step in a Task fails, if you want to act on that failure, you have to do it in a separate pod?)

bobcatfish

Thanks for taking the time to discuss this with me yesterday @afrittoli and @pritidesai ! I've added some comments with updates we could make the TEP to flesh it out based on the discussion.

My biggest overall request is that we try to frame this problem clearly without referring to the solution (yet!)

bobcatfish · 2021-01-21T20:25:14Z

teps/0040-capture-step-exit-code.md

+- '@afrittoli'
+---
+
+# TEP-0040: Capture Step Exit Code


This title feels to me like it jumps to an implementation but doesn't capture what's really motivating this.

Some ideas for alternative names: (sorry they are so wordy XD)

Support mulitstep Tasks when some steps have no shell

Allow Tasks to continue to act after the failure of a step

Allow steps to fail without failing or halting Tasks

yup I agree, I think "Allow steps to fail without failing Tasks" fits well as the biggest motivation here is to continue executing subsequent steps and preventing the task from failing.

"treat a step as passing regardless of whether it fails or succeeds" - @skaegi
"ignore the failure of a step" - @imjasonh

bobcatfish · 2021-01-21T20:27:48Z

teps/0040-capture-step-exit-code.md

+As a pipeline author, I would like to design a task with multiple steps. One of the steps is running an
+enterprise  image to run unit tests, and the next step needs to report test results even after a previous
+step results in failure due to tests failure.
+


From our discussion the other day: the nightly release pipeline example is kind of different b/c that's about allowing an individual Task to fail; this proposal is about allowing steps within Tasks to fail. If we wanted to apply this proposal to our nightly release pipeline for example, we'd have to modify the golang-test task

bobcatfish · 2021-01-21T20:53:42Z

teps/0040-capture-step-exit-code.md

+## Motivation
+
+It should be possible to easily use off-the-shelves (OTS) images as steps in Tekton tasks. A task author has no
+control on the image but may desire to capture an error and continue executing the pipeline.


Another detail we could include in the TEP: many projects have to create "debug" versions of their images to support users that want shells; in CD automation especially there are so many scenarios where you do need shells that often folks end up just having to always use the debug version

The wrapping script feels to me like a workaround though

could you explain a bit more? It feels to me like being able to run a script that runs the command that needs to be run gives the user the most flexibility, e.g. they can grab the exit code, they can grab stdout + stderr, they can do simple processing like grep and awk, etc.

bobcatfish · 2021-01-21T20:55:50Z

teps/0040-capture-step-exit-code.md

+enterprise  image to run unit tests, and the next step needs to report test results even after a previous
+step results in failure due to tests failure.
+
+


We could add a requirements section as well, e.g.:

Requirements

Users should be able to use prebuilt images as-is without having to do one or more of the following (see also TEP-0011:

Investigating how they are built to understand if they contain a shell and possibly overriding the entrypoint

Build and maintain their own images (i.e. add in required shell or other binaries) from those images

It should be possible to know that a step was allowed to fail by observing the status of the TaskRun (and PipelineRun if applicable) (e.g. to show a "warning" / display as "yellow" status in a UI)

When a step is allowed to fail, the exit code of the process that failed should not be lost and should at a minimum be available in the status of the TaskRun (and PipelineRun if applicable)

When a step is allowed to fail, the Task should not be marked as failed. <-- im a bit unsure about this one @afrittoli - if we want to do this, I think we need to make it possible for other steps to know that this happened and determine if the overall status should be failed, what do you think? (e.g. in the case of uploading test results, you want to keep running steps so you can do the upload, but still the overall Task has failed)

bumping this

teps/0040-capture-step-exit-code.md

afrittoli · 2021-01-25T16:28:29Z

It is important that steps within a task be allowed to fail and
other steps in the same task can access the exit code.

@pritidesai I slightly disagree on this last point specifically. I agree it could be interesting for steps to have access to other step outcomes, but that would be a different TEP with different use cases. For this TEP specifically, we should not plan on exposing outcome of steps within a running pipeline.

bobcatfish · 2021-01-25T17:09:51Z

/assign vdemeester

skaegi · 2021-01-25T17:18:08Z

/assign

bobcatfish · 2021-01-25T19:53:39Z

@pritidesai I slightly disagree on this last point specifically. I agree it could be interesting for steps to have access to other step outcomes, but that would be a different TEP with different use cases. For this TEP specifically, we should not plan on exposing outcome of steps within a running pipeline.

My sticking point @afrittoli is that if we don't allow other steps to see the status of the step that was allowed to fail, the Task will never be able to fail (or it would require significant workarounds to make it happen).

For example in the use case where I want to run tests, then upload results, I think there is a good chance that a user will still want the overall Task to fail, they just want to be able to upload results before that happens.

afrittoli · 2021-01-25T20:15:15Z

@pritidesai I slightly disagree on this last point specifically. I agree it could be interesting for steps to have access to other step outcomes, but that would be a different TEP with different use cases. For this TEP specifically, we should not plan on exposing outcome of steps within a running pipeline.

My sticking point @afrittoli is that if we don't allow other steps to see the status of the step that was allowed to fail, the Task will never be able to fail (or it would require significant workarounds to make it happen).

For example in the use case where I want to run tests, then upload results, I think there is a good chance that a user will still want the overall Task to fail, they just want to be able to upload results before that happens.

That's a good point.
However I think the main use case for this TEP is not delay the failure to another step but to prevent it.
It is true that we have use cases where we do some failure analysis in the pipeline itself, but that would be based on information extracted from the output of the previous step rather than the exit code.
For instance we could have a list of tests that must pass, and if one of them is in the list of failures we fail.

This TEP will work well in combination with that about capturing stdout, since with both it will be possible for the next step to always run and have access to the output, even when the original docker image did not support sending the output to a file via args.

pritidesai · 2021-01-25T22:59:59Z

thanks @bobcatfish and @afrittoli for all the reviews.

I have addressed most of the comments except two:

Also, I will add drawback section after adding design proposal 🙃

pritidesai · 2021-02-02T22:05:21Z

Right, there is a relationship between the two - you could potentially use (2) to give you (1) as well, but (2) doesn't let you have any control over this at the pipeline level, it has to be baked into the Task. I'd argue that to truly satisfy (1) we'd want that control at the pipeline level

I have expanded the scope of this TEP to include both (1 - task level) and (2 - pipeline level) given that not everything has to be part of the same PR. I can certainly drop (2) from this proposal if you would like. My rationale behind adding (2) is to make the API changes for (1) compatible with the changes we propose for (2).

I have added some examples and renamed the proposal to ignore step error. If we decide to drop support for (2) from this proposal, it can be further renamed to ignore step error in a task.

vdemeester · 2021-02-03T17:55:43Z

I do see value in the proposal and those are valid use case we need to take care of. So I have a probably naive and general question, but given the following proposal (#318 and #316), and the potential of the pipeline-in-pipeline approach (with CustomTask support), what would it mean to support those use case at the Task + Pipeline level only ?

Having an option on both might be a bit confusing to users. When to fail, when to not ? I also wonder if this is a concern at the Task authoring time, at the Pipeline authoring time, at the PipelineRun execution ?

Similar to my comment on #316, I think we need to find where to put the cursor in our composability story and where each option fit.

On this proposal, I feel the use case can be both "at authoring time" and "at runtime". Those might be two different problems to tackle but on the authoring part, the question remain, if we were to have a way to say "don't fail the pipeline if this task fail", would we need the same at the task level (with steps) — and my initial thought is we would not.

Having it at the task/step level, we force the task author to take that decision for any user of the pipeline (related to @bobcatfish comment on modifying the catalog tasks).
Having it at the pipeline/task level, it's on the pipeline author to take that decision, which seems a bit more reusable (and would make the catalog more useful because more reusable).

I wonder thus if we should mention step in this proposal or not — as it is, from my perspective, already a beginning of a solution. And as linked above, we may need to consider our option taking into acount few other TEPs that aim to remove the overhead of running a pipeline.

bobcatfish · 2021-02-04T20:56:22Z

I have expanded the scope of this TEP to include both (1 - task level) and (2 - pipeline level) given that not everything has to be part of the same PR. I can certainly drop (2) from this proposal if you would like. My rationale behind adding (2) is to make the API changes for (1) compatible with the changes we propose for (2).

I think that expanding the scope of this TEP to include (2) is going to make it harder to merge - b/c of the potential overlap with the other TEPs that @vdemeester pointed out - i have been convinced that there is a legitimate gap that would address in isolation that specifically applies to:

Allowing a step to fail without stopping execution of a Task at Task authoring time

e.g. the use case about a platform team which wants to create a reusable Task that runs tests, and transforms their output, whether they fail or not

I suggest we tackle (2) separately

pritidesai · 2021-02-05T05:26:31Z

Sounds good, thanks @bobcatfish @vdemeester 🙏

I have reduced its scope to only address (1):

Allowing a step to fail without stopping execution of a Task at Task authoring time

bobcatfish

I'm pretty much ready to approve this - I do want to add one more requirement, not sure if @afrittoli has strong feelings about it.

Also not sure where you stand @vdemeester , with the reduced scope are you happy to go ahead with this, or you want to see the other TEPs through first?

bobcatfish · 2021-02-05T15:24:13Z

teps/0040-ignore-step-errors.md

+        fi
+```
+
+This workaround does not apply to off-the-shelf container images.


thanks for the detailed overview!

bobcatfish · 2021-02-05T15:27:42Z

teps/0040-ignore-step-errors.md

+  in an image and then altering the entrypoint to allow capturing errors.
+
+* It should be possible to know that a step failed and subsequent steps allowed to continue by observing the status of
+  the `TaskRun`.


I'd also like to include a requirement that other steps within the Task can access this information, I believe @afrittoli disagrees - I'm hoping we can include this as a requirement but also could deliver the feature incrementally such that the initial implementation doesn't need to support this.

I feel strongly that we haven't sufficiently addressed the use cases and problem statement without it

I have added that (other steps within the task can access the process exit code) in the requirements. @afrittoli please let me know if you still disagree or have any other ideas.

Proposing a tep to ignore step errors and provide an option to continue after capturing the non zero exit code. Also document the container termination state to access it after the pipeline exectution finishes.

afrittoli · 2021-02-18T12:40:50Z

This is the PoC: https://github.com/afrittoli/pipeline/tree/tep_0040

bobcatfish · 2021-02-18T19:20:02Z

Thanks for all the updates! I'm happy to go ahead with the problem statement!! Will be great to take a look at our options here.

/approve

We'll need to get @vdemeester to review before we merge tho - he had some thoughts as well

/hold

tekton-robot · 2021-02-18T19:20:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [bobcatfish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vdemeester

So, I have one question : if we would go ahead with Injecting a shell into steps, then a user could wrap on any image, would this be necessary ?

I feel it could still be useful but on the other hand, the "shell" injection could also bring some "tool" to wrap the exit code and make it available later on. I am mainly asking this to discuss if the use case needs to be handle by an API field or by some tooling.

afrittoli · 2021-02-24T10:18:06Z

I think this would still be useful.

Having a shell still requires writing a script which points to the correct entrypoint for the container image, which is more work, more error prone, and it's more brittle as the entrypoint might change in the image.

Using a script on the other hand gives more flexibility, as it allows doing output redirection, reading the error code and more.

So I think they're both useful features to have.

vdemeester · 2021-02-24T16:33:56Z

Having a shell still requires writing a script which points to the correct entrypoint for the container image, which is more work, more error prone, and it's more brittle as the entrypoint might change in the image.

Using a script on the other hand gives more flexibility, as it allows doing output redirection, reading the error code and more.

So I think they're both useful features to have.

Agreed 😉

/lgtm

pritidesai · 2021-02-24T18:39:48Z

Thanks @vdemeester @afrittoli

@bobcatfish we have review from @vdemeester, can I remove the hold now? Thanks 🙏

vdemeester · 2021-02-25T06:05:01Z

/hold cancel
@pritidesai next step is to add more use case and design detail to pass this as implementable then 😉

tekton-robot requested review from dibbles and imjasonh January 8, 2021 18:41

tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 8, 2021

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jan 8, 2021

pritidesai force-pushed the capture-step-exit-code branch from 7141d27 to 91eafc4 Compare January 8, 2021 22:22

tekton-robot assigned bobcatfish Jan 11, 2021

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 11, 2021

bobcatfish reviewed Jan 11, 2021

View reviewed changes

vdemeester mentioned this pull request Jan 12, 2021

[proposal] Capturing step exit codes in Tasks tektoncd/pipeline#2800

Closed

afrittoli changed the title ~~TEP - capture step exit code~~ TEP0040 - capture step exit code Jan 13, 2021

afrittoli changed the title ~~TEP0040 - capture step exit code~~ TEP-0040 - capture step exit code Jan 13, 2021

bobcatfish mentioned this pull request Jan 15, 2021

Use the status of step as a parameter in the next step tektoncd/pipeline#3220

Closed

bobcatfish reviewed Jan 21, 2021

View reviewed changes

tekton-robot assigned vdemeester Jan 25, 2021

tekton-robot assigned skaegi Jan 25, 2021

pritidesai force-pushed the capture-step-exit-code branch from 91eafc4 to bc9ad7b Compare January 25, 2021 22:57

tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 2, 2021

pritidesai force-pushed the capture-step-exit-code branch from 025386b to 2729f73 Compare February 2, 2021 22:06

pritidesai changed the title ~~TEP-0040 - capture step exit code~~ TEP-0040 - ignore step error Feb 2, 2021

Base automatically changed from master to main February 3, 2021 16:34

pritidesai force-pushed the capture-step-exit-code branch from 2729f73 to 9c3ae5f Compare February 3, 2021 21:26

pritidesai force-pushed the capture-step-exit-code branch from 9c3ae5f to 86f9490 Compare February 5, 2021 05:24

pritidesai force-pushed the capture-step-exit-code branch from 86f9490 to dbeda27 Compare February 5, 2021 05:27

bobcatfish reviewed Feb 5, 2021

View reviewed changes

pritidesai mentioned this pull request Feb 5, 2021

TEP-0050 - ignore task failures #342

Merged

tep to ignore step errors

0b75788

Proposing a tep to ignore step errors and provide an option to continue after capturing the non zero exit code. Also document the container termination state to access it after the pipeline exectution finishes.

pritidesai force-pushed the capture-step-exit-code branch from dbeda27 to 0b75788 Compare February 5, 2021 19:24

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2021

vdemeester reviewed Feb 24, 2021

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 24, 2021

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2021

tekton-robot merged commit 48e3015 into tektoncd:main Feb 25, 2021

ghost mentioned this pull request Feb 26, 2021

Add a field to Step that allows it to ignore failed prior Steps *within the same Task* tektoncd/pipeline#1559

Closed

		enterprise image to run unit tests, and the next step needs to report test results even after a previous
		step results in failure due to tests failure.

TEP-0040 - ignore step error #302

TEP-0040 - ignore step error #302

Conversation

pritidesai commented Jan 8, 2021 • edited Loading

pritidesai commented Jan 8, 2021

bobcatfish commented Jan 11, 2021

afrittoli commented Jan 11, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish Jan 21, 2021 • edited Loading

Choose a reason for hiding this comment

afrittoli Jan 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pritidesai commented Jan 12, 2021

pritidesai commented Jan 12, 2021 • edited Loading

bobcatfish commented Jan 12, 2021

afrittoli commented Jan 13, 2021

bobcatfish commented Jan 14, 2021

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish Jan 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Requirements

Choose a reason for hiding this comment

afrittoli commented Jan 25, 2021

bobcatfish commented Jan 25, 2021

skaegi commented Jan 25, 2021

bobcatfish commented Jan 25, 2021

afrittoli commented Jan 25, 2021

pritidesai commented Jan 25, 2021 • edited Loading

pritidesai commented Feb 2, 2021

vdemeester commented Feb 3, 2021

bobcatfish commented Feb 4, 2021

pritidesai commented Feb 5, 2021 • edited Loading

bobcatfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afrittoli commented Feb 18, 2021

bobcatfish commented Feb 18, 2021

tekton-robot commented Feb 18, 2021

vdemeester left a comment

Choose a reason for hiding this comment

afrittoli commented Feb 24, 2021

vdemeester commented Feb 24, 2021

pritidesai commented Feb 24, 2021

vdemeester commented Feb 25, 2021

pritidesai commented Jan 8, 2021 •

edited

Loading

bobcatfish Jan 21, 2021 •

edited

Loading

afrittoli Jan 25, 2021 •

edited

Loading

pritidesai commented Jan 12, 2021 •

edited

Loading

bobcatfish Jan 21, 2021 •

edited

Loading

pritidesai commented Jan 25, 2021 •

edited

Loading

pritidesai commented Feb 5, 2021 •

edited

Loading