-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retry only proper node (#11589) #11839
Conversation
42dd44a
to
693a956
Compare
eb47abd
to
cacf70b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing.
So, in your example from your original issue, are you saying the hook node is actually a child of the argosay node? That seems kind of surprising, doesn't it? Are both nodes of type "Pod"?
workflow/controller/operator.go
Outdated
childrenIds = append(childrenIds, n.ID) | ||
} | ||
} | ||
return childrenIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments to this function? I'm trying to understand why you're not just looking for all child nodes of "Type"="Retry"
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you! I forgot the parent was "Retry" and the children were "Pod"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, starting to understand the logic now looking at what you posted. Thanks for doing that.
So, I can assume that the spec for this is a running hook which has expression: workflow.status == "Running"
, right? Does it seem a little odd that the hook node is a child of the Retry node? Do you figure that it would be the child of whatever the first running Node is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also can't really remember what is a child of what. If there are 3 Steps A-> B-> C, do you know if B would be a child of A because it follows it, or are A, B, and C siblings (which is what I would assume)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am kind of thinking we shouldn't rely on somebody not changing the name in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So, this PR resolves both cases? And is this particular method - getChildNodeIdsRetried() - the logic that runs in both cases?
Yes. This PR resolves both cases.
It does make me wonder if perhaps the NodeStatus could potentially have some new key/value pair in it like isHook - what do you think?
Ideally, completely agree with you.
I'm going to write a code that changes NodeStatus go struct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! Once you do I think you'll need to run make codegen
to auto-generate the CRD and other files from that struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've added NodeStatus.Hooked
and some node initialization. Could you give me feedback :)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a thorough comment to the top of this function to explain what it's doing, what the various cases are, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to post in this PR an example WorkflowStatus
corresponding to a WorkflowSpec: mainly just want to see the "tree" of Nodes to understand the relationship between the nodes, where both Retry and Hook are involved. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliev0
Thanks! Is it clear enough?
- Node type of retried nodes
type: Pod
- parent node of hooked node
type: Rety
Full nodeStatus
is following:
nodeStatus
status:
phase: Failed
startedAt: '2023-09-30T13:49:20Z'
finishedAt: '2023-09-30T13:50:05Z'
progress: 2/6
message: No more retries left
nodes:
lovely-bear:
id: lovely-bear
name: lovely-bear
displayName: lovely-bear
type: Retry
templateName: argosay
templateScope: local/lovely-bear
phase: Failed
message: No more retries left
startedAt: '2023-09-30T13:49:20Z'
finishedAt: '2023-09-30T13:49:56Z'
progress: 2/6
resourcesDuration:
cpu: 42
memory: 30
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-argosay-74551197/main.log
exitCode: '1'
children:
- lovely-bear-477066958
- lovely-bear-794339884
- lovely-bear-4167892875
- lovely-bear-410250672
- lovely-bear-74551197
- lovely-bear-2163413524
lovely-bear-2163413524:
id: lovely-bear-2163413524
name: lovely-bear.hooks.failed
displayName: lovely-bear.hooks.failed
type: Pod
templateName: hook
templateScope: local/lovely-bear
phase: Succeeded
startedAt: '2023-09-30T13:49:56Z'
finishedAt: '2023-09-30T13:50:01Z'
progress: 1/1
resourcesDuration:
cpu: 7
memory: 5
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-hook-2163413524/main.log
exitCode: '0'
hostNodeName: k3d-k3s-default-server-0
lovely-bear-410250672:
id: lovely-bear-410250672
name: lovely-bear(2)
displayName: lovely-bear(2)
type: Pod
templateName: argosay
templateScope: local/lovely-bear
phase: Failed
message: Error (exit code 1)
startedAt: '2023-09-30T13:49:38Z'
finishedAt: '2023-09-30T13:49:43Z'
progress: 0/1
resourcesDuration:
cpu: 7
memory: 5
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-argosay-410250672/main.log
exitCode: '1'
hostNodeName: k3d-k3s-default-server-0
lovely-bear-4167892875:
id: lovely-bear-4167892875
name: lovely-bear(1)
displayName: lovely-bear(1)
type: Pod
templateName: argosay
templateScope: local/lovely-bear
phase: Failed
message: Error (exit code 1)
startedAt: '2023-09-30T13:49:29Z'
finishedAt: '2023-09-30T13:49:34Z'
progress: 0/1
resourcesDuration:
cpu: 7
memory: 5
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-argosay-4167892875/main.log
exitCode: '1'
hostNodeName: k3d-k3s-default-server-0
lovely-bear-477066958:
id: lovely-bear-477066958
name: lovely-bear(0)
displayName: lovely-bear(0)
type: Pod
templateName: argosay
templateScope: local/lovely-bear
phase: Failed
message: Error (exit code 1)
startedAt: '2023-09-30T13:49:20Z'
finishedAt: '2023-09-30T13:49:25Z'
progress: 0/1
resourcesDuration:
cpu: 7
memory: 5
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-argosay-477066958/main.log
exitCode: '1'
hostNodeName: k3d-k3s-default-server-0
lovely-bear-74551197:
id: lovely-bear-74551197
name: lovely-bear(3)
displayName: lovely-bear(3)
type: Pod
templateName: argosay
templateScope: local/lovely-bear
phase: Failed
message: Error (exit code 1)
startedAt: '2023-09-30T13:49:47Z'
finishedAt: '2023-09-30T13:49:52Z'
progress: 0/1
resourcesDuration:
cpu: 7
memory: 5
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-argosay-74551197/main.log
exitCode: '1'
hostNodeName: k3d-k3s-default-server-0
lovely-bear-794339884:
id: lovely-bear-794339884
name: lovely-bear.hooks.running
displayName: lovely-bear.hooks.running
type: Pod
templateName: hook
templateScope: local/lovely-bear
phase: Succeeded
startedAt: '2023-09-30T13:49:20Z'
finishedAt: '2023-09-30T13:49:25Z'
progress: 1/1
resourcesDuration:
cpu: 7
memory: 5
outputs:
artifacts:
- name: main-logs
s3:
key: lovely-bear/lovely-bear-hook-794339884/main.log
exitCode: '0'
hostNodeName: k3d-k3s-default-server-0
conditions:
- type: PodRunning
status: 'False'
- type: Completed
status: 'True'
resourcesDuration:
cpu: 42
memory: 30
artifactRepositoryRef:
configMap: artifact-repositories
key: default-v1
namespace: argo
artifactRepository:
archiveLogs: true
s3:
endpoint: minio:9000
bucket: my-bucket
insecure: true
accessKeySecret:
name: my-minio-cred
key: accesskey
secretKeySecret:
name: my-minio-cred
key: secretkey
artifactGCStatus:
notSpecified: true
workflow/controller/operator.go
Outdated
childrenIds = append(childrenIds, n.ID) | ||
} | ||
} | ||
return childrenIds |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
2863319
to
a3b9b25
Compare
baf47dc
to
bc88279
Compare
Thanks for updating all of the logic to add in the new I'm wondering if I can request that you add even more to the description above as far as what you did? I'm short on time, so the more thorough you are there, the more it will help me to understand. And then we can get this merged. |
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
64af803
to
9b8ebc5
Compare
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 <toyamagu2021@gmail.com>
Signed-off-by: toyamagu-2021 toyamagu2021@gmail.com
Fixes #11589
Motivation
Modifications
Verification
manifest