-
Notifications
You must be signed in to change notification settings - Fork 264
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pdgetrf If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @pdgetrf. Thanks for your PR. I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
signed CLA |
cf61bce
to
bb6056e
Compare
btw, please also open a PR on design doc. |
and please also append the review comments last time and highlight how those comments were addressed. |
d18c098
to
457a2d3
Compare
defer wg.Done() | ||
|
||
annotation := map[string]string{v1alpha1.BackfillAnnotationKey: "true"} | ||
err := ssn.PatchAnnotation(pendingTask, annotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotation should be also cached, there maybe out of sync between scheduler cache & apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP
func (ssn *Session) BackFillEligible(obj interface{}) bool { | ||
for _, tier := range ssn.Tiers { | ||
for _, plugin := range tier.Plugins { | ||
jrf, found := ssn.backFillEligibleFns[plugin.Name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be also configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"configurable" as in "can be configured to turn on and off"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backfill function can be enabled/disabled from configurations. BackfillEligible
is a function provided by each plugin to decide which job is eligible to be a backfill job. What is the use case to conditionally turn on/off BackfillEligible
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case to conditionally turn on/off BackfillEligible function?
Different plugin may provide this callback, we need a configuration to controll it.
pkg/scheduler/util.go
Outdated
|
||
"github.com/kubernetes-sigs/kube-batch/pkg/scheduler/conf" | ||
"github.com/kubernetes-sigs/kube-batch/pkg/scheduler/framework" | ||
"github.com/kubernetes-sigs/kube-batch/pkg/scheduler/plugins" | ||
"gopkg.in/yaml.v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the original format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's original format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"io/ioutil"
"gopkg.in/yaml.v2"
"github.com/kubernetes-sigs/kube-batch/pkg/scheduler/conf"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we have tools to enforce this? oh right, it's reverted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will wait the tool is brought back to fix this. this shouldn't be fixed by hand.
0bddba3
to
21fedf1
Compare
still working on a few issues. the e2e tests from the previous rollback in the master branch (not related to this PR) wrecked a mess. attempting to fix it now at least for backfill. |
case err := <-errChannel: | ||
if err != nil { | ||
fmt.Println("error ", err) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happen if Pod is patched but did not dispatch because of other error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dispatch fails, the pod remains unbound in the current scheduling round, and therefore will not be started. In the next scheduling round, when we create TaskInfo for a Pod in NewTaskInfo
, we will noticed that the Pod is unbound but has backfill annotation. In this case, we will remove backfill annotation from the Pod.
@pdgetrf It looks to me that we do not yet have the logic to remove backfill annotation in NewTaskInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we will remove backfill annotation from the Pod.
It seems not implemented in this PR, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be fixed in a following up PR which, as you already suggested, is smaller and easier to review. I do not see how this should block the current PR. It will only make this one even bigger. And we are going to address this in the starvation PR anyway. And backfill will be disabled in this PR anyway. If your purpose is to further delay the starvation PR including all the necessary testing, yes I agree we should fix it here in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happen if Pod is patched but did not dispatch because of other error?
code has the answer.
61815db
to
0f1900d
Compare
…iority jobs cannot run but still hold resources
@pdgetrf: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Added backfill action that allows lower priority jobs to run when higher priority jobs cannot run but still hold resources
Tested with added unit tests, e2e tests and local manual test