Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break sequential loop if an iteration fails #3644

Closed
laymain opened this issue Jul 30, 2020 · 13 comments · Fixed by #5315
Closed

Break sequential loop if an iteration fails #3644

laymain opened this issue Jul 30, 2020 · 13 comments · Fixed by #5315
Assignees
Labels
area/controller Controller issues, panics area/looping `withParams`, `withItems`, and `withSequence` area/spec Changes to the workflow specification. solution/workaround There's a workaround, might not be great, but exists type/feature Feature request

Comments

@laymain
Copy link

laymain commented Jul 30, 2020

Summary

Break sequential loop if an iteration fails

Motivation

I have a sequential loop using parallelism: 1 and I need to stop the loop execution when an iteration fails,
in order to not process next iteration if previous one has failed.

Given this template:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: seq-loop-
spec:
  arguments:
    parameters:
    - name: items
      value: |
        ["a", "b", "c"]

  entrypoint: seq-loop
  templates:
  - name: seq-loop
    parallelism: 1
    inputs:
      parameters:
      - name: items
    steps:
    - - name: iteration
        template: iteration
        withParam: "{{inputs.parameters.items}}"

  - name: iteration
    steps:
    - - name: step1
        template: succeed-step
    - - name: step2
        template: failed-step

  - name: succeed-step
    container:
      image: alpine
      command: ['/bin/sh', '-c']
      args: ["exit 0"]

  - name: failed-step
    container:
      image: alpine
      command: ['/bin/sh', '-c']
      args: ["exit 1"]

Currently, executing this workflow gives:

STEP                   TEMPLATE      PODNAME                    DURATION  MESSAGE
 ✖ seq-loop-b68gn      seq-loop                                           child 'seq-loop-b68gn-342633600' failed
 └-·-✖ iteration(0:a)  iteration                                          child 'seq-loop-b68gn-3536970033' failed
   | ├---✔ step1       succeed-step  seq-loop-b68gn-2592104941  9s
   | └---✖ step2       failed-step   seq-loop-b68gn-3536970033  4s        failed with exit code 1-✖ iteration(1:b)  iteration                                          child 'seq-loop-b68gn-1741679919' failed
   | ├---✔ step1       succeed-step  seq-loop-b68gn-3000376863  4s
   | └---✖ step2       failed-step   seq-loop-b68gn-1741679919  4s        failed with exit code 1-✖ iteration(2:c)  iteration                                          child 'seq-loop-b68gn-221656217' failed
     ├---✔ step1       succeed-step  seq-loop-b68gn-1906215637  4s
     └---✖ step2       failed-step   seq-loop-b68gn-221656217   4s        failed with exit code 1

I would like to be able to break the loop if an iteration fails:

STEP                   TEMPLATE      PODNAME                    DURATION  MESSAGE
 ✖ seq-loop-b68gn      seq-loop                                           child 'seq-loop-b68gn-342633600' failed
 └-·-✖ iteration(0:a)  iteration                                          child 'seq-loop-b68gn-3536970033' failed
     ├---✔ step1       succeed-step  seq-loop-b68gn-2592104941  9s
     └---✖ step2       failed-step   seq-loop-b68gn-3536970033  4s        failed with exit code 1

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@laymain laymain added the type/feature Feature request label Jul 30, 2020
@simster7
Copy link
Member

When you use withParam Argo creates and schedules all resulting tasks in parallel. Since you have parallelism: 1, they do in practice run sequentially, but at that point they have already been scheduled to run, regardless of failures.

Perhaps using this recursive for loop example might work better for this use case?

@simster7 simster7 added the solution/workaround There's a workaround, might not be great, but exists label Jul 30, 2020
@jessesuen
Copy link
Member

I believe we used to fail fast when parallelism was reduced. This might be a regression.

@laymain
Copy link
Author

laymain commented Jul 30, 2020

@simster7 I could indeed use recursive for loop, but I have hundreds of iteration which I feel will end
with an unnecessary over-complicated workflow result.
I hoped that there was another option than using recursive for loop.

@simster7 simster7 self-assigned this Jul 30, 2020
@simster7
Copy link
Member

I'll investigate @jessesuen's comment that this is in fact a regression and it should actually be possible to have the behavior you desired using withParam and parallelism

@laymain
Copy link
Author

laymain commented Jul 30, 2020

Thank you for digging into that issue

@alexec alexec added type/bug type/regression Regression from previous behavior (a specific type of bug) and removed type/feature Feature request labels Jul 30, 2020
@simster7
Copy link
Member

simster7 commented Aug 4, 2020

I took a look at the code and I don't get any indication that this behavior should actually be possible. I will discuss with @jessesuen and report back

@simster7
Copy link
Member

simster7 commented Aug 5, 2020

I discussed this with Jesse and tried this in 2.5 (https://github.com/argoproj/argo/blob/release-2.5/workflow/controller/steps.go) and we determined that this was never possible, so it is in fact an enhancement.

We've determined that this is something we would consider, we would just need to come up with a design that would allow the user to specify that they would like this behavior when using with* and parallelism.

@simster7 simster7 added type/feature Feature request and removed type/bug type/regression Regression from previous behavior (a specific type of bug) more-information-needed labels Aug 5, 2020
@laymain
Copy link
Author

laymain commented Aug 6, 2020

I would really appreciate this feature.
I am actually using the recursive loop workaround, but it gives a way more complicated workflow.

@simster7
Copy link
Member

simster7 commented Mar 5, 2021

@laymain Please add any feedback you may have at #5315

@simster7
Copy link
Member

This is now supported by setting failFast: true in the template that invokes the items/has a parallelism: 1

  - name: seq-loop
    failFast: true
    inputs:
      parameters:
      - name: items
    parallelism: 1
    steps:
    - - name: iteration
        template: iteration
        withParam: '{{inputs.parameters.items}}'

This will be out on v3.1

@simster7
Copy link
Member

Tagging upvoters: @tanenbaum @ppapasani1 @jbehling @oliverwm1 @shbalaku @trim777 @bybybybyb @dinever @rajasudhan @raph-af @camilocot

@laymain
Copy link
Author

laymain commented Mar 17, 2021

Thank you for the update, that seems nice!

Can you confirm that it will work with a parallelism that have any value?
Let's say that we have a parallelism: 3 and failFast: true, if one iteration fails every next iteration (that aren't running yet) will fail?

Also, is there a way to test the v3.1 at its current stage?

@dinever
Copy link
Member

dinever commented Mar 30, 2021

Thank you for the update, that seems nice!

Can you confirm that it will work with a parallelism that have any value?
Let's say that we have a parallelism: 3 and failFast: true, if one iteration fails every next iteration (that aren't running yet) will fail?

Also, is there a way to test the v3.1 at its current stage?

Yes. If we have parallelism: 3 and failFast: true, if one iteration fails, the entire sequential loop will fail fast, and the iterations that are not running yet will not be triggered.

Try this. We have 5 iterations and parallelism: 3. In this case, the first 3 iterations will run together. The next 2 iterations will not run.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: seq-loop-
spec:
  arguments:
    parameters:
    - name: items
      value: |
        ["a", "b", "c", "d", "e"]

  entrypoint: seq-loop
  templates:
  - name: seq-loop
    failFast: true
    parallelism: 3
    inputs:
      parameters:
      - name: items
    steps:
    - - name: iteration
        template: iteration
        withParam: "{{inputs.parameters.items}}"

  - name: iteration
    steps:
    - - name: step1
        template: succeed-step
    - - name: step2
        template: failed-step

  - name: succeed-step
    container:
      image: alpine
      command: ['/bin/sh', '-c']
      args: ["exit 0"]

  - name: failed-step
    container:
      image: alpine
      command: ['/bin/sh', '-c']
      args: ["exit 1"]

Screen Shot 2021-03-29 at 9 14 56 PM

@agilgur5 agilgur5 added area/controller Controller issues, panics area/spec Changes to the workflow specification. labels Sep 2, 2023
@agilgur5 agilgur5 added the area/looping `withParams`, `withItems`, and `withSequence` label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/looping `withParams`, `withItems`, and `withSequence` area/spec Changes to the workflow specification. solution/workaround There's a workaround, might not be great, but exists type/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants