Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add podGroup completed phase #2667

Merged
merged 2 commits into from
Feb 13, 2023

Conversation

waiterQ
Copy link
Contributor

@waiterQ waiterQ commented Feb 7, 2023

Modification Motivation

Volcano can schedule normal pods, but it shows up podGroup inqueue problem at some conditions. scheduling k8s-job,
podGroup is still at running phase when pod went to completed phase; changing deployment's requeuest will cause old podGroup in phase Inqueue.
So I pick and fix #2589 (Feature/add replicaset gc pg), and add phase Completed for normal pod podGroup to enhances podGroup.

Test Result

when deployment roll-updating, there're two podgroups exists.
image

when deployment roll-update is over, one podgroup left.
image

when k8s job is completed
image
podgroup is completed
image

ci e2e result
image

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 7, 2023
@waiterQ waiterQ force-pushed the add-pg-completed branch 2 times, most recently from 9d31199 to 3a6c9e4 Compare February 7, 2023 09:19
@waiterQ waiterQ changed the title Add pg completed Add podGroup completed phase Feb 7, 2023
Gaizhi and others added 2 commits February 9, 2023 11:18
Signed-off-by: Gaizhi <donghouze@minimac.com>
Signed-off-by: shaoqiu <516595344@qq.com>
@@ -61,6 +63,26 @@ func (pg *pgcontroller) addPod(obj interface{}) {
pg.queue.Add(req)
}

func (pg *pgcontroller) addReplicaSet(obj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waiterQ As far as I know, when replicaset was created, it would always be 0 replica. And after creating, the replicaset would scale up to defined replica numbers. So why deleting podgroup on both addReplicaSet and updateReplicaSet, but not only updateReplicaSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. In normal process with one version, volcano just need updateReplicaSet, and if consider the situation upgrade from a version to another, there isn't addReplicaSet help to cleanup stock podgroups in cluster. addReplicaSet is work with already-exist podgroups, addReplicaSet work with upcoming podgroups.

return
}

if *rs.Spec.Replicas == 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a probability that there will be two replicasets with none zero replicas when doing roll upgrade which means two pg exists, does this matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about deployment's rollingUpdate strategy, in pod rolling creating, its definitely 2 kind pods exists. I think it's normal, not a problem.

@william-wang
Copy link
Member

Please add the test results on the PR, thanks.

@waiterQ
Copy link
Contributor Author

waiterQ commented Feb 13, 2023

Please add the test results on the PR, thanks.

ok, done.

Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -0,0 +1,189 @@
/*
Copyright 2021 The Volcano Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Copyright is not correct.

Expect(len(pgs.Items)).To(Equal(1), "only one podGroup should be exists")
})

It("k8s Job", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a formal and complete description for func

@@ -650,4 +650,62 @@ var _ = Describe("Job E2E Test", func() {
Expect(q2ScheduledPod).Should(BeNumerically("<=", expectPod/2+1),
fmt.Sprintf("expectPod %d, q1ScheduledPod %d, q2ScheduledPod %d", expectPod, q1ScheduledPod, q2ScheduledPod))
})

It("changeable Deployment's PodGroup", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give a formal and complete description for func

@@ -0,0 +1,189 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a util package, why name the file name as deployment.go

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2023
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2023
@volcano-sh-bot volcano-sh-bot merged commit 0254501 into volcano-sh:master Feb 13, 2023
@waiterQ waiterQ deleted the add-pg-completed branch March 24, 2023 07:40
@@ -272,6 +272,10 @@ func jobStatus(ssn *Session, jobInfo *api.JobInfo) scheduling.PodGroupStatus {
// If there're enough allocated resource, it's running
if int32(allocated) >= jobInfo.PodGroup.Spec.MinMember {
status.Phase = scheduling.PodGroupRunning
// If all allocated tasks is succeeded, it's completed
if len(jobInfo.TaskStatusIndex[api.Succeeded]) == allocated {
Copy link

@zhoushuke zhoushuke Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for batchv1 native job, if using .spec.completions and .spec.parallelism in job, for case, successed 10, in the same time, the queue is full, other 10 pod will pending, len(jobInfo.TaskStatusIndex[api.Succeeded]) == allocated will be true, job not finished but pg status is completed, would it happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants