Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LifeCycle hooks phase not updated at workflow level #8529

Closed
Piroddi opened this issue Apr 28, 2022 · 32 comments · Fixed by #8586
Closed

LifeCycle hooks phase not updated at workflow level #8529

Piroddi opened this issue Apr 28, 2022 · 32 comments · Fixed by #8586

Comments

@Piroddi
Copy link

Piroddi commented Apr 28, 2022

Summary

The phase of LifeCycle hooks configured at a workflow level are not updated once completed. They are stuck in pending state forever.

This prevents workflows from being retried, which negatively impacts long running workflows with lots of steps

What happened/what you expected to happen?

Lifecycle hook phase should change to Succeeded once completed

What version are you running?

v3.3.3

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@sarabala1979
Copy link
Member

sarabala1979 commented Apr 29, 2022

it is known bug, in hooks, the workflow will not be reconciled if it is already completed. I will try to come up with fix.

@alexec
Copy link
Contributor

alexec commented May 5, 2022

@sarabala1979 this is not a popular bug. I'd like someone from the community to try and fix this, rather than the core team, as our goal should always to make the greatest positive impact for our customers.

Could I ask you to add some guidance for the engineer trying to fix this?

Specifically, link to the exact line of code where you think changes need making.

@alexec
Copy link
Contributor

alexec commented May 5, 2022

@sarabala1979 please ignore my last comment. You have done this already with the linked PR.

@Piroddi would you be willing to complete @sarabala1979 PoC PR?

@alexec alexec removed the triage label May 5, 2022
@sandeepk8s
Copy link
Contributor

Hi @alexec - when can we expect the changes made in #8586 to get merged. Thanks 😄

@sandeepk8s
Copy link
Contributor

We are affected by this bug I guess! Using HTTP templates with lifecycle hooks to send our CI workflows build status to bitbucket builds. All the hooks are custom.

hooks:
    failed:
      template: notify
      arguments:
        parameters:
          - name: state
            value: FAILED
          - name: tohash
            value: '{{workflow.parameters.tohash}}'
      expression: workflow.status == "Failed"
    running:
      template: notify
      arguments:
        parameters:
          - name: state
            value: INPROGRESS
          - name: tohash
            value: '{{workflow.parameters.tohash}}'
      expression: workflow.status == "Running"
    success:
      template: notify
      arguments:
        parameters:
          - name: state
            value: SUCCESSFUL
          - name: tohash
            value: '{{workflow.parameters.tohash}}'
      expression: workflow.status == "Succeeded"

We have multiple apps and repos - use only CI workflows pipepine for all. When the wfs take more than 2 minutes - it is affecting and showing the wrong status.
Expected Result:
image

Actual Result:
image

@alexec
Copy link
Contributor

alexec commented May 25, 2022

@sandeepitachi someone needs to take over #8586. Would you like to?

@sandeepk8s
Copy link
Contributor

sandeepk8s commented May 25, 2022

@alexec I'd love to but I'm not great at coding. 😄 Also not much familiar with Go.

@alexec
Copy link
Contributor

alexec commented May 26, 2022

Go is VERY easy to learn. It is designed to be. Took me about 3h to learn.

  1. Spend 3h doing http://tour.golang.org
  2. You are now a Go programmer.

@sandeepk8s
Copy link
Contributor

sandeepk8s commented May 26, 2022

Thank you @alexec 😄 Will spend some time this weekend for sure! Would like to contribute!

Also spoke to my architect @brgoode too (he's good at Go) BUT like you - he's super BUSY working with different teams in our org. If he got some time - he's happy to look at it too

@stale
Copy link

stale bot commented Jun 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Jun 12, 2022
@roofurmston
Copy link
Contributor

I believe this is still a known bug, so I don't think it should be closed.

@stale stale bot removed the problem/stale This has not had a response in some time label Jun 13, 2022
@stale
Copy link

stale bot commented Jul 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Jul 10, 2022
@roofurmston
Copy link
Contributor

I believe this is still a known bug, so I don't think it should be closed. Would be good to know if this is not the case. Otherwise, I think it should stay open.

@stale stale bot removed the problem/stale This has not had a response in some time label Jul 11, 2022
@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Jul 31, 2022
@stale
Copy link

stale bot commented Aug 12, 2022

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

@bencompton
Copy link

I'm seeing this issue, and I also notice when bringing up the logs for the hook that the logs UI continuously appends the same logs to the displayed log output every second. Closing the logs window and re-opening it clears the logs and then starts continuously appending the same logs once again.

@stale stale bot removed the problem/stale This has not had a response in some time label Aug 16, 2022
@jackivanov
Copy link

Lifecycle hook phase should change to Succeeded once completed

We're seeing the same behavior for workflow.status == "Failed"

sarabala1979 added a commit that referenced this issue Aug 23, 2022
)

* fix: Workflow level http template hook status update

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>

* fix: added test

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>
@ezk84
Copy link
Contributor

ezk84 commented Aug 25, 2022

This has been affecting our team also. Seems like a fix is in, thanks @sarabala1979. Will it make 3.4?

@ese
Copy link

ese commented Sep 14, 2022

I still have the issue testing in v3.4.0-rc4 where it's supposed to have included the fix. cc: @sarabala1979

hooks with expression workflow.status == "Failed" or workflow.status == "Succeeded" remains with PHASE=Pending despite they are being executed and finished fine.

Should we open a new issue?

example:

spec:
  templates:
    - name: main
      steps:
        - - name: docker-go-test
            templateRef:
              name: docker-go-test
              template: unit-test
  entrypoint: main
  hooks:
    failed:
      templateRef:
        name: notifier
        template: notifier-failed
      expression: workflow.status == "Failed"
    running:
      templateRef:
        name: notifier
        template: notifier-running
      expression: workflow.status == "Running"
    succeeded:
      templateRef:
        name: notifier
        template: notifier-succeeded
      expression: workflow.status == "Succeeded"

image

@ese
Copy link

ese commented Sep 16, 2022

As workaround you can use exit handlers https://argoproj.github.io/argo-workflows/walk-through/exit-handlers/ instead hooks for failed and succeeded which works fine updating phase after workflow ends:
example:

spec:
  onExit: exit-handler
  templates:
    - name: exit-handler
      steps:
        - - name: succeeded
            templateRef:
              name: notifier
              template: notifier-succeed
            when: '{{workflow.status}} == Succeeded'
          - name: failed
            templateRef:
              name: notifier
              template: notifier-failed
            when: '{{workflow.status}} != Succeeded'

@cheery550
Copy link

As workaround you can use exit handlers https://argoproj.github.io/argo-workflows/walk-through/exit-handlers/ instead hooks for failed and succeeded which works fine updating phase after workflow ends: example:

spec:
  onExit: exit-handler
  templates:
    - name: exit-handler
      steps:
        - - name: succeeded
            templateRef:
              name: notifier
              template: notifier-succeed
            when: '{{workflow.status}} == Succeeded'
          - name: failed
            templateRef:
              name: notifier
              template: notifier-failed
            when: '{{workflow.status}} != Succeeded'

it also do not work.

@Piroddi
Copy link
Author

Piroddi commented Oct 31, 2022

@sarabala1979 I have just upgraded to 3.4.2. However this bug is not resolved. Experiencing the same behaviour

juchaosong pushed a commit to juchaosong/argo-workflows that referenced this issue Nov 3, 2022
…8529 (argoproj#8586)

* fix: Workflow level http template hook status update

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>

* fix: added test

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>
Signed-off-by: juchao <juchao@coscene.io>
@Piroddi
Copy link
Author

Piroddi commented Dec 4, 2022

Still an issue. Simple example:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
      failure:
        expression: workflow.status == "Failed"
        template: heads
  templates:
    - name: main
      steps:
      - - name: step1
          template: fail
    
    - name: fail
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["exit 1"]

    - name: heads
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"running hook\""]

@Piroddi
Copy link
Author

Piroddi commented Dec 4, 2022

Hi @alexec @sarabala1979 I have time to look into this issue. It's really effecting us, as we cant retry steps in long running workflows.

Should I create a new issue or can you reopen this one?

@elim19
Copy link

elim19 commented Feb 15, 2023

I believe this is still a known bug, so I don't think it should be closed.

@ese
Copy link

ese commented Feb 21, 2023

I can confirm is still an issue in v3.4.5

@capacman
Copy link

I can confirm as well. v3.4.5

@Mastergalen
Copy link
Contributor

This is still an issue in v3.4.8

@Piroddi
Copy link
Author

Piroddi commented Jun 26, 2023

The issue got resolved for me in v3.4.7, from the following fix:

#10307

@Mastergalen
Copy link
Contributor

Strange, using the workflow level example I am still seeing the hooks stuck in the Pending state.

image

Workflow YAML

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
    exit: # Exit handler
      template: http
    running:
      expression: workflow.status == "Running"
      template: http
  templates:
    - name: main
      steps:
      - - name: step1
          template: heads
    
    - name: heads
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"it was heads\""]
    
    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

@Piroddi
Copy link
Author

Piroddi commented Jun 26, 2023

So looks like the above issue is related to http templates. As it works for non http templates:

metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
    exit: # Exit handler
      template: tails
    running:
      expression: workflow.status == "Running"
      template: tails
  templates:
    - name: main
      steps:
      - - name: step1
          template: heads
    
    - name: heads
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"it was heads\""]
    
    - name: tails
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"it was tails\""]

image

I think the issue is related to:

https://github.com/GeunSam2/argo-workflows/blob/master/workflow/controller/hooks.go#L34

Which needs to be investigated.

Since this issue is already closed. I would recommend opening a new one, specifically for http so it can get more eyes

@Mastergalen
Copy link
Contributor

@Piroddi Thanks for replying!

Ah, you are right. After swapping out the HTTP template it works.

I have a separate issue with workflow.status not being accessible in template level hooks then. I've opened a new issue here: #11269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.