-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try get get old pg when new pg not exist #2400
try get get old pg when new pg not exist #2400
Conversation
Signed-off-by: Qiuyu Wu <qiuyu.wu@shopee.com>
Signed-off-by: Qiuyu Wu <qiuyu.wu@shopee.com>
c6172f3
to
d621c91
Compare
Thanks for your contribution. I think it is important for users who upgrades volcano versions from below v1.6.0 to v1.6.0. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, just one comment i think the PG construction code can be resued, maybe abstract a buildPodGroup
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: william-wang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
try get get old pg when new pg not exist
try get get old pg when new pg not exist Signed-off-by: william-wang <wang.platform@gmail.com>
Signed-off-by: Qiuyu Wu qiuyu.wu@shopee.com
fix:#2389
what happen:
after this feature #2140 applied, error comes when upgrading volcano from 1.4 to 1.6, details showed in #2389.
this is because volcano-job-controller creates extra pgs with job uid suffix for existing jobs, which are already running, and thus two pgs exist in cluster for one vj. when existing jobs are using over half of the cluster resource already, the new pgs with uid will cause inqueue get larger in proportion AddJobEnqueueableFn (line 334 in pkg/scheduler/plugins/proportion/proportion.go). after inqueue gets larger than realCapability, the new coming jobs will pend forever