fix: Create new boundary to run onExit nodes #5478

simster7 · 2021-03-22T15:52:36Z

codecov · 2021-03-22T15:58:17Z

Codecov Report

Merging #5478 (1ca5f6a) into master (5bd7ce8) will decrease coverage by 0.02%.
The diff coverage is 44.33%.

❗ Current head 1ca5f6a differs from pull request most recent head 6cb9fac. Consider uploading reports for the commit 6cb9fac to get more accurate results

@@            Coverage Diff             @@
##           master    #5478      +/-   ##
==========================================
- Coverage   16.50%   16.48%   -0.03%     
==========================================
  Files         243      243              
  Lines       43774    43779       +5     
==========================================
- Hits         7225     7215      -10     
- Misses      35568    35583      +15     
  Partials      981      981

Impacted Files	Coverage Δ
cmd/argo/commands/clustertemplate/lint.go	`0.00% <0.00%> (ø)`
cmd/argo/commands/cron/lint.go	`0.00% <0.00%> (ø)`
cmd/argo/commands/lint.go	`0.00% <0.00%> (ø)`
cmd/argo/lint/lint.go	`53.12% <40.00%> (-7.71%)`	⬇️
cmd/argo/lint/formatter_pretty.go	`68.75% <66.66%> (-20.14%)`	⬇️
cmd/argo/lint/formatter_simple.go	`83.33% <81.81%> (+1.51%)`	⬆️
cmd/argo/commands/get.go	`56.95% <100.00%> (-1.05%)`	⬇️
workflow/controller/dag.go	`72.36% <100.00%> (ø)`
workflow/controller/operator.go	`70.25% <100.00%> (-0.26%)`	⬇️
workflow/controller/steps.go	`70.50% <100.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bd7ce8...6cb9fac. Read the comment docs.

simster7 · 2021-03-22T16:02:30Z

workflow/controller/operator.go

 	if templateRef != "" && woc.GetShutdownStrategy().ShouldExecute(true) {
 		woc.log.Infof("Running OnExit handler: %s", templateRef)
-		onExitNodeName := common.GenerateOnExitNodeName(parentDisplayName)


parentDisplayName is non unique, causing nodes with the same display name (e.g. when using withParam) to create onExit nodes of the same name. After the first onExit node is created, subsequent onExit nodes created by different parent nodes (but with the same display name) will fail creation as they already exist. Use parentNodeName which is always unique instead.

This fix has been split to #5486

simster7 · 2021-03-22T16:03:19Z

workflow/controller/steps.go

@@ -260,7 +260,7 @@ func (woc *wfOperationCtx) executeStepGroup(ctx context.Context, stepGroup []wfv
 		if !childNode.Fulfilled() {
 			completed = false
 		} else if childNode.Completed() {
-			hasOnExitNode, onExitNode, err := woc.runOnExitNode(ctx, step.OnExit, step.Name, childNode.Name, stepsCtx.boundaryID, stepsCtx.tmplCtx)
+			hasOnExitNode, onExitNode, err := woc.runOnExitNode(ctx, step.OnExit, childNode.Name, childNodeID, stepsCtx.tmplCtx)


On exit nodes should run under the boundary of the node that called them (the "exited" node)

Before:

STEP TEMPLATE PODNAME DURATION MESSAGE ✔ dag-diamond-wzqnc diamond ├─✔ A echo dag-diamond-wzqnc-2374547985 6s ├─✔ A.onExit oe dag-diamond-wzqnc-1294891656 6s ├─✔ B echo dag-diamond-wzqnc-2324215128 6s ├─✔ C echo dag-diamond-wzqnc-2340992747 6s ├─✔ B.onExit oe dag-diamond-wzqnc-250094443 6s ├─✔ C.onExit oe dag-diamond-wzqnc-728915506 6s ├─✔ D echo dag-diamond-wzqnc-2424880842 6s └─✔ D.onExit oe dag-diamond-wzqnc-2186811989 6s

After:

STEP TEMPLATE PODNAME DURATION MESSAGE ✔ dag-diamond-ztvr2 diamond ├─✔ A echo dag-diamond-ztvr2-3230871262 2s │ └─✔ onExit oe dag-diamond-ztvr2-1416350393 6s ├─✔ B echo dag-diamond-ztvr2-3214093643 6s │ └─✔ onExit oe dag-diamond-ztvr2-2959263954 6s ├─✔ C echo dag-diamond-ztvr2-3197316024 6s │ └─✔ onExit oe dag-diamond-ztvr2-3577844235 6s └─✔ D echo dag-diamond-ztvr2-3314759357 6s └─✔ onExit oe dag-diamond-ztvr2-2103494724 6s

Note this is not only for cosmetic reasons: boundary parallelism is the main reason for making this change

alexec

what will happen to in-flight workflows at time of upgrade?

simster7 · 2021-03-22T16:20:46Z

what will happen to in-flight workflows at time of upgrade?

They will not fail, but there is a tiny chance that they end up redoing work (e.g. an onExit node executed twice) if the onExit node with the old name has been executed but not completed. If it has executed and completed, then its parent node will be succeeded and Argo won't try to schedule a new onExit node.

Regardless, I don't believe we provide any guarantees that you will be able to upgrade while running workflows seamlessly.

simster7 · 2021-03-22T17:27:52Z

This PR currently fixes two issues in one:

~~fix: Create new boundary to run onExit nodes #5478 (comment) is a clear bug and a clear fix~~ Split to fix(controller): Use node.Name instead of node.DisplayName for onExit nodes #5486
fix: Create new boundary to run onExit nodes #5478 (comment) fixes undefined behavior and the fix has to take into account how we want that behavior to be defined.

~~Should I split into this into two PRs so that we can merge the first fix independently?~~ Done

Signed-off-by: Simon Behar <simbeh7@gmail.com>

simster7 commented Mar 22, 2021

View reviewed changes

alexec reviewed Mar 22, 2021

View reviewed changes

split

6cb9fac

Signed-off-by: Simon Behar <simbeh7@gmail.com>

simster7 force-pushed the oe-fix branch from d0f3a81 to 6cb9fac Compare March 22, 2021 19:10

simster7 mentioned this pull request Mar 22, 2021

fix(controller): Use node.Name instead of node.DisplayName for onExit nodes #5486

Merged

simster7 closed this May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Create new boundary to run onExit nodes #5478

fix: Create new boundary to run onExit nodes #5478

simster7 commented Mar 22, 2021

codecov bot commented Mar 22, 2021 •

edited

Loading

simster7 Mar 22, 2021

simster7 Mar 22, 2021

simster7 Mar 22, 2021

simster7 Mar 22, 2021

simster7 Mar 22, 2021

alexec left a comment

simster7 commented Mar 22, 2021

simster7 commented Mar 22, 2021 •

edited

Loading

fix: Create new boundary to run onExit nodes #5478

fix: Create new boundary to run onExit nodes #5478

Conversation

simster7 commented Mar 22, 2021

codecov bot commented Mar 22, 2021 • edited Loading

Codecov Report

simster7 Mar 22, 2021

Choose a reason for hiding this comment

simster7 Mar 22, 2021

Choose a reason for hiding this comment

simster7 Mar 22, 2021

Choose a reason for hiding this comment

simster7 Mar 22, 2021

Choose a reason for hiding this comment

simster7 Mar 22, 2021

Choose a reason for hiding this comment

alexec left a comment

Choose a reason for hiding this comment

simster7 commented Mar 22, 2021

simster7 commented Mar 22, 2021 • edited Loading

codecov bot commented Mar 22, 2021 •

edited

Loading

simster7 commented Mar 22, 2021 •

edited

Loading