Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mark PodGroup completed when pod fails #3807

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bood
Copy link

@bood bood commented Nov 8, 2024

Similar to api.Succeeded, when pods are in api.Failed state, PodGroup should also be marked as Completed

@volcano-sh-bot
Copy link
Contributor

Welcome @bood!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. 😃

@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 8, 2024
@bood bood force-pushed the pg-completed-pod-error-case branch 2 times, most recently from d2d3f0a to 527fa13 Compare November 8, 2024 08:06
@bood
Copy link
Author

bood commented Nov 8, 2024

/assign @lowang-bh

@lowang-bh
Copy link
Member

Thanks for your contributions. How about add a ut for changes?

@bood bood force-pushed the pg-completed-pod-error-case branch from 527fa13 to ab6e03b Compare November 9, 2024 22:55
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign lowang-bh
You can assign the PR to them by writing /assign @lowang-bh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 9, 2024
@bood bood force-pushed the pg-completed-pod-error-case branch from ab6e03b to 1521c91 Compare November 9, 2024 22:58
@bood
Copy link
Author

bood commented Nov 9, 2024

Thanks for your contributions. How about add a ut for changes?

one e2e test is added

Copy link
Member

@hwdef hwdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@volcano-sh-bot volcano-sh-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Nov 11, 2024
@bood
Copy link
Author

bood commented Nov 11, 2024

Is there any problem with CI? schedulingbase, schedulingaction, jobp all run ok on my laptop.

@bood bood force-pushed the pg-completed-pod-error-case branch 3 times, most recently from 0f51a17 to 9cc95d8 Compare November 15, 2024 02:44
Signed-off-by: bood <boodweb@gmail.com>
@bood bood force-pushed the pg-completed-pod-error-case branch from 9cc95d8 to e149f45 Compare November 15, 2024 05:52
@hwdef
Copy link
Member

hwdef commented Nov 15, 2024

Should we add a Failed Status for podgroup?

@bood
Copy link
Author

bood commented Nov 15, 2024

Should we add a Failed Status for podgroup?

For now, I don't see any logic in volcano that cares about how the job completes. So perhaps we can just keep it simple.

However, I'm not sure whether the state should revert to PodGroupPending when it's already in PodGroupCompleted, say the pods are deleted after job failure. I saw similar case in issue #2208

		} else if jobInfo.PodGroup.Status.Phase != scheduling.PodGroupInqueue {
			status.Phase = scheduling.PodGroupPending
		}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants