Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add finalizer to workflow pod to prevent pod deleted. Fixes #8783. Continuing Work of #9058 #12413

Merged
merged 40 commits into from
Jan 29, 2024

Conversation

sakai-ast
Copy link
Contributor

@sakai-ast sakai-ast commented Dec 26, 2023

Fixes #8783 Continuing Work of #9058

Motivation

Building upon the foundational work of #9058, I am advancing the implementation to address our needs.
see: #9058 (comment)

Modifications

Verification

ut and e2e tests

…rgoproj#8783 Continuing Work of argoproj#9058

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
@sakai-ast sakai-ast marked this pull request as ready for review December 26, 2023 09:53
@agilgur5 agilgur5 requested a review from juliev0 December 29, 2023 04:30
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Dec 29, 2023
@juliev0
Copy link
Contributor

juliev0 commented Jan 10, 2024

I'm surprised but I've just become aware that the Argo Server deletes Pods as part of RetryWorkflow() (here). I figured it was something that would only be the role of the Controller to do, but in any case, I guess this is an example where pod deletion is happening outside of these calls. I suppose the labelPodCompleted code would happen in any case, but I'm not sure what happens if you try to invoke this on an incomplete Workflow.

sakai-ast and others added 3 commits January 10, 2024 15:46
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
…-ast/argo-workflows into feat/add-finalizer-to-workflow-pod

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
@sakai-ast
Copy link
Contributor Author

I suppose the labelPodCompleted code would happen in any case

I agree. In my local tests, retried pods always have the labelPodCompleted and the finalizer is removed.

but I'm not sure what happens if you try to invoke this on an incomplete Workflow.

I believe the feature flag is intended for situations like the one described above.

@juliev0
Copy link
Contributor

juliev0 commented Jan 12, 2024

Would you mind adding to the Description more information on how you decided on your approach and your investigation of corner cases?

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
…erHandlers

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
…-ast/argo-workflows into feat/add-finalizer-to-workflow-pod

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
sakai-ast and others added 2 commits January 26, 2024 16:35
Copy link
Member

@isubasinghe isubasinghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a couple more comments, could you please address them? Just so that we are on the same page.

Copy link
Contributor

@juliev0 juliev0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved before waiting for e2e tests....will approve once they all do

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
@juliev0 juliev0 merged commit 6abe8a9 into argoproj:main Jan 29, 2024
27 checks passed
@sakai-ast sakai-ast deleted the feat/add-finalizer-to-workflow-pod branch January 29, 2024 06:55
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request Feb 4, 2024
…rgoproj#8783 Continuing Work of argoproj#9058 (argoproj#12413)

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
@alexec
Copy link
Contributor

alexec commented Feb 5, 2024

🍾

isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request Feb 27, 2024
…rgoproj#8783 Continuing Work of argoproj#9058 (argoproj#12413)

Signed-off-by: Atsushi Sakai <sakai.at24@gmail.com>
@imliuda
Copy link
Contributor

imliuda commented Mar 7, 2024

if workflow-controller stopped for a while, eg, maintaining or upgrading, and then workflows deleted by somebody, if this will still work properly? during the gap, some pods completed, but the finalizer have no chance to get removed, how can we ensure that pods can be deleted.

@Joibel
Copy link
Member

Joibel commented Mar 7, 2024

You can't ensure the pods will be deleted. The deletion will hang until either the controller is started or someone through another means deletes the finalizer words on the pod. The point of this change is to do this, and prevent inconsistency of state between the workflow and pods.

If you are happy with the alternative that a workflow may have unexpectedly missing pods you may delete the finalizer manually from the pods. This is scriptable.

@imliuda
Copy link
Contributor

imliuda commented Mar 14, 2024

It is unnecessary for keeping consistency between workflow and pods, once workflow-controller has finished processing pods (that is node.Fullfilled()), then the pod can be deleted safely. Or, that may induce a memory leak of kube-apiserver due to large amount pods (kube-controller-manager will do pod gc of Failed and Succeed pods). Since finalizer is added by workflow-controller, then it should have to be responsible for removing it when the corresponding workflow deleted. Removing finalizer by a script is not appropriate.

I can make some optimizations for situation when pod's owner workflow has been deleted, and open a new PR for that.

@agilgur5 agilgur5 added this to the v3.6.0 milestone Apr 12, 2024
@agilgur5 agilgur5 changed the title feat: Add finalizer to workflow pod to prevent 'pod deleted'. Fixes #8783 Continuing Work of #9058 feat: Add finalizer to workflow pod to prevent pod deleted. Fixes #8783 Continuing Work of #9058 Jun 12, 2024
@agilgur5 agilgur5 changed the title feat: Add finalizer to workflow pod to prevent pod deleted. Fixes #8783 Continuing Work of #9058 feat: Add finalizer to workflow pod to prevent pod deleted. Fixes #8783. Continuing Work of #9058 Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use finalizer to prevent pod deleted
7 participants