Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): Add failFast flag to DAG and Step templates #5315

Merged
merged 12 commits into from
Mar 12, 2021

Conversation

simster7
Copy link
Member

@simster7 simster7 commented Mar 5, 2021

Closes #3644

@simster7 simster7 requested a review from jessesuen as a code owner March 5, 2021 18:13
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be more generalized instead of focusing on pods and their failures. Perhaps something similar to failure/success conditions for resource templates would be more useful?

@alexec alexec changed the title feat: Allow to specify maxFailed pods under template feat(controller: Allow to specify maxFailed pods under template Mar 8, 2021
@alexec alexec changed the title feat(controller: Allow to specify maxFailed pods under template feat(controller): Allow to specify maxFailed pods under template Mar 8, 2021
Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how frugal this solution to a popular issue is. We need to get @jessesuen to sign-off on the proto changes.

workflow/controller/operator.go Outdated Show resolved Hide resolved
@simster7 simster7 marked this pull request as draft March 10, 2021 15:19
@simster7 simster7 changed the title feat(controller): Allow to specify maxFailed pods under template feat(controller): Add failFast flag to DAG and Step templates Mar 10, 2021
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
@simster7 simster7 marked this pull request as ready for review March 12, 2021 15:43
Comment on lines +17 to +18
GREP_LOGS := ""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a grep string for goreman output

@@ -6005,3 +6005,200 @@ func TestHasOutputResultRef(t *testing.T) {
assert.True(t, hasOutputResultRef("generate-random", &wf.Spec.Templates[0]))
assert.True(t, hasOutputResultRef("generate-random-1", &wf.Spec.Templates[0]))
}

const stepsFailFast = `
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already ensured this is minimal :)

This one is particularly longer because it tests nested boundary behavior: failFast should only affect a single depth of boundary and not be affected by children boundaries (such as a retryStrategy)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
}
}

func (c count) count(key counterType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this “inc”


type count map[counterType]int

func (c count) addKeyIfNotPresent(key counterType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d merge this logic into “count”


type counter struct {
key counterType
ifNode func(wfv1.NodeStatus) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment these fields?

}
}

func TestCounters(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m assuming you’ve checked your code coverage.

wfv1 "github.com/argoproj/argo-workflows/v3/pkg/apis/workflow/v1alpha1"
)

type counter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring is a good idea

@@ -2175,54 +2122,70 @@ func (woc *wfOperationCtx) markNodeWaitingForLock(nodeName string, lockName stri
}

// checkParallelism checks if the given template is able to be executed, considering the current active pods and workflow/template parallelism
func (woc *wfOperationCtx) checkParallelism(tmpl *wfv1.Template, node *wfv1.NodeStatus, boundaryID string) error {
func (woc *wfOperationCtx) checkParallelism(tmpl *wfv1.Template, node *wfv1.NodeStatus, boundaryID string) (*wfv1.NodeStatus, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see the new mode being used anywhere?

activeSiblings := woc.countActiveChildren(boundaryID)
counts := woc.countNodes(getActiveChildrenCounter(boundaryID), getFailedOrErroredChildrenCounter(boundaryID))
activeSiblings := int64(counts.getCountType(counterTypeActiveChildren))
templateFailedOrErroredChildren := int64(counts.getCountType(counterTypeFailedOrErroredChildren))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“FailedOrErrored” is wordy, how about “unsuccessfully”?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be expensive (O(N x 3) ?) operations. But only needed when you have FailFast? Maybe guard them this that flag.

activeSiblings := int64(counts.getCountType(counterTypeActiveChildren))
templateFailedOrErroredChildren := int64(counts.getCountType(counterTypeFailedOrErroredChildren))

if boundaryTemplate.FailFast != nil && *boundaryTemplate.FailFast && templateFailedOrErroredChildren > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add add “tmpl.IsFailFast()”?

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: Simon Behar <simbeh7@gmail.com>
@simster7 simster7 merged commit 57c05df into argoproj:master Mar 12, 2021
@simster7 simster7 mentioned this pull request Mar 15, 2021
27 tasks
@simster7 simster7 mentioned this pull request Mar 29, 2021
77 tasks
@book987 book987 mentioned this pull request Mar 30, 2021
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Break sequential loop if an iteration fails
4 participants