-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(controller): Add failFast flag to DAG and Step templates #5315
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be more generalized instead of focusing on pods and their failures. Perhaps something similar to failure/success conditions for resource templates would be more useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how frugal this solution to a popular issue is. We need to get @jessesuen to sign-off on the proto changes.
GREP_LOGS := "" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds a grep string for goreman output
@@ -6005,3 +6005,200 @@ func TestHasOutputResultRef(t *testing.T) { | |||
assert.True(t, hasOutputResultRef("generate-random", &wf.Spec.Templates[0])) | |||
assert.True(t, hasOutputResultRef("generate-random-1", &wf.Spec.Templates[0])) | |||
} | |||
|
|||
const stepsFailFast = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already ensured this is minimal :)
This one is particularly longer because it tests nested boundary behavior: failFast
should only affect a single depth of boundary and not be affected by children boundaries (such as a retryStrategy
)
workflow/controller/node_counters.go
Outdated
} | ||
} | ||
|
||
func (c count) count(key counterType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this “inc”
workflow/controller/node_counters.go
Outdated
|
||
type count map[counterType]int | ||
|
||
func (c count) addKeyIfNotPresent(key counterType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d merge this logic into “count”
workflow/controller/node_counters.go
Outdated
|
||
type counter struct { | ||
key counterType | ||
ifNode func(wfv1.NodeStatus) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment these fields?
} | ||
} | ||
|
||
func TestCounters(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m assuming you’ve checked your code coverage.
workflow/controller/node_counters.go
Outdated
wfv1 "github.com/argoproj/argo-workflows/v3/pkg/apis/workflow/v1alpha1" | ||
) | ||
|
||
type counter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring is a good idea
workflow/controller/operator.go
Outdated
@@ -2175,54 +2122,70 @@ func (woc *wfOperationCtx) markNodeWaitingForLock(nodeName string, lockName stri | |||
} | |||
|
|||
// checkParallelism checks if the given template is able to be executed, considering the current active pods and workflow/template parallelism | |||
func (woc *wfOperationCtx) checkParallelism(tmpl *wfv1.Template, node *wfv1.NodeStatus, boundaryID string) error { | |||
func (woc *wfOperationCtx) checkParallelism(tmpl *wfv1.Template, node *wfv1.NodeStatus, boundaryID string) (*wfv1.NodeStatus, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t see the new mode being used anywhere?
workflow/controller/operator.go
Outdated
activeSiblings := woc.countActiveChildren(boundaryID) | ||
counts := woc.countNodes(getActiveChildrenCounter(boundaryID), getFailedOrErroredChildrenCounter(boundaryID)) | ||
activeSiblings := int64(counts.getCountType(counterTypeActiveChildren)) | ||
templateFailedOrErroredChildren := int64(counts.getCountType(counterTypeFailedOrErroredChildren)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“FailedOrErrored” is wordy, how about “unsuccessfully”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be expensive (O(N x 3) ?) operations. But only needed when you have FailFast? Maybe guard them this that flag.
workflow/controller/operator.go
Outdated
activeSiblings := int64(counts.getCountType(counterTypeActiveChildren)) | ||
templateFailedOrErroredChildren := int64(counts.getCountType(counterTypeFailedOrErroredChildren)) | ||
|
||
if boundaryTemplate.FailFast != nil && *boundaryTemplate.FailFast && templateFailedOrErroredChildren > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add add “tmpl.IsFailFast()”?
Signed-off-by: Simon Behar <simbeh7@gmail.com>
Closes #3644