Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finally tasks should be triggered in case of missing results #4438

Closed
jensh007 opened this issue Dec 21, 2021 · 24 comments
Closed

Finally tasks should be triggered in case of missing results #4438

jensh007 opened this issue Dec 21, 2021 · 24 comments
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jensh007
Copy link

Expected Behavior

I have a finally task that consumes results of previous tasks. The documentation states:

finally tasks are guaranteed to be executed in parallel after all PipelineTasks under tasks have completed regardless of success or error.

The task producing the result fails with a nonzero exit code. I expect that the finally task gets triggered.

Actual Behavior

Finally task is not run instead it is waiting forever for results. It is fine if the expected results are missing (and set to null). But the finally task should always be triggered.

Steps to Reproduce the Problem

  1. Apply the following definition:
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: generator-task
spec:
  results:
  - name: my_result
  steps:
  - name: make-result
    image: busybox
    script: |
      echo -n "foo" | tee $(results.my_result.path)
      exit 1
---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: finally-task
spec:
  params:
  - description: value of result from previous task
    name: former_result
  steps:
  - name: use-result
    image: busybox
    script: |
        echo -n "$(params.former_result)"

---
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: resulttest
spec:
  finally:
  - name: notification
    params:
    - name: former_result
      value: $(tasks.generator.results.my_result)
    taskRef:
      name: finally-task      
  tasks:
  - name: generator
    taskRef:
      name: generator-task
  1. Create a PipelineRun for pipeline resulttest
  2. The finally task for this pipeline never gets executed.

Additional Info

There is a related section in the documentation about results after error but this is not helpful. Setting onError: continue completely hides the error which is not what I want.

Not sure but may be related to TEP-0058. Potentially another use case to be covered?

  • Kubernetes version:

    Output of kubectl version:

    Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", 
    GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6", GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097", GitTreeState:"clean", BuildDate:"2021-10-27T17:44:26Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
    
    
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

    Client version: 0.17.0
    Pipeline version: v0.28.2
    Dashboard version: v0.21.0

@jensh007 jensh007 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 21, 2021
@jerop jerop assigned jerop and unassigned jerop Dec 21, 2021
@pritidesai
Copy link
Member

thank you @jensh007 for this report 👍

It is designed such that the finally task is not executed in case of a missing result:

Q. What happens to Pipeline when the dependent task boskos-acquire either failed or not executed, and the task result leased-resource is not initialized?

A. The finally Task boskos-release is attempted and included in the list of skippedTasks since the task result leased-resource is not initialized. Pipeline controller logs this validation failure including param name leased-resource of the finally task boskos-release with the result reference leased-resource and result producing task boskos-acquire. Pipeline continues executing rest of the final tasks and exits with completion.

We are looking at this proposal to help solve your use case:
https://github.com/tektoncd/community/blob/main/teps/0048-task-results-without-results.md

This proposal was demonstrated at the API WG: https://docs.google.com/document/d/17PodAxG8hV351fBhSu7Y_OIPhGTVgj6OJ2lPphYYRpU/edit#heading=h.qq31rcli4y3f

The demo is available here at the mark of 16:30.

Havent been able to work on it since then.

@jensh007
Copy link
Author

jensh007 commented Jan 7, 2022

Thanks for sharing the proposal. I understand that it may be designed like this but I consider this not to be very practical. I assume in many cases finally tasks are used to e.g. send notifications about build status etc. In case something fails (and thus a result may be missing) you always want to get the notification and not a silent death... So hope to see the enhancements soon :)

@jensh007
Copy link
Author

jensh007 commented Jan 7, 2022

And one more comment: You at least should adapt the documentation (see citation above). The statement that "finally tasks are guaranteed to be executed regardless of of success or error" is wrong if this is by design.

@jerop jerop added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 10, 2022
@jerop jerop added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Feb 17, 2022
@S4lt5
Copy link

S4lt5 commented Apr 28, 2022

Ran into this today, and the documentation is absolutely confusing. I was sitting here scratching my head as to why my finally task kept getting skipped. There was no error or other indicator that there was something wrong with the results.

@pritidesai
Copy link
Member

TEP-0103 is addressing this missing piece where the pipelineRun reports the reason why a task was skipped.

@Yablargo what is your use case? What is the expectation, run the finally task anyways (with missing results)?

@S4lt5
Copy link

S4lt5 commented Apr 28, 2022

@pritidesai I actually post results, but since the task fails ( I fail it intentionally, the QC failed.), I can't read the results I posted, when I get to the finally task.

In a simplified example, I run npm audit, see a critical vuln has been caught, and want to write the bit of JSON on the findings summary to /tekton/results/whatever and pick that up later in my finally task to post it back to the originating merge request, but the results I just pushed don't exist and my task gets skipped.

@pritidesai
Copy link
Member

I see, thanks @Yablargo for the details! The pipelineRun does not publish results when a task fails and that's the reason finally task is not executed.

We have designed finally such that pipelineRun must skip running finally in case of missing results to avoid any kind of accidental execution.

I think, this use case is more suitable for #3749 where pipelineRun must publish results (if they were initialized) in case of a failure.

Do you vote for #3749? wdyt?

@pritidesai
Copy link
Member

pritidesai commented Apr 28, 2022

I noticed you do have vote for it 😄 #3749 (comment)

@S4lt5
Copy link

S4lt5 commented Apr 28, 2022

Yep, Thanks! For right now I upload everything at the end of each TaskRun and pull it back down in the Finally (well, I was already uploading anyways, I just was hoping to use /results to save some calls and task params).

It works, but it would save some complexity to be able to just poll the /results.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2022
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 26, 2022
@pritidesai
Copy link
Member

this is not fixed yet 🙃

/remove-lifecycle rotten

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 26, 2022
@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2022
@ppitonak
Copy link
Contributor

still not fixed and very annoying

/remove-lifecycle stale

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2022
@drriguz
Copy link

drriguz commented Mar 1, 2023

Still exists on latest pipeline release: v0.44.0

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2023
@BobDu
Copy link

BobDu commented Jun 29, 2023

it still exists.
/remove-lifecycle rotten

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 29, 2023
@S4lt5
Copy link

S4lt5 commented Jun 29, 2023

This is still incredibly important, and if I am being honest this sort of thing is why I have moved on from tekton.

@pritidesai
Copy link
Member

I am very sorry to hear that! Yup, I can understand, we are working towards making this project more resilient. At the same time, any kind of contribution is highly appreciated.

We have implemented a feature to allow producing a task result from a failed task:

#6510

This is available since https://github.com/tektoncd/pipeline/releases/tag/v0.48.0. Please let me know if this does not work for you or looking for something more. Thanks!

@pritidesai
Copy link
Member

The use case mentioned in the description ⬆️ is now supported: https://github.com/tektoncd/pipeline/blob/main/test/task_results_from_failed_tasks_test.go

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 27, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 27, 2023
@vdemeester
Copy link
Member

Seems like it is fixed with 0.56.0 release at least. I tried the reproducer today, and it works as expected.

$ kubectl get tr                         
NAME                                SUCCEEDED   REASON      STARTTIME   COMPLETIONTIME
resulttest-run-5bzmx-generator      False       Failed      107s        93s
resulttest-run-5bzmx-notification   True        Succeeded   93s         89s
$ tkn tr logs resulttest-run-5bzmx-notification
[use-result] foo

I'll go ahead and close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
Status: Done
Status: Done
Development

No branches or pull requests

9 participants