-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use podGroup instead of PDB in v1beta2 #954
Conversation
Hi @thandayuthapani. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -21,6 +21,10 @@ required = [ | |||
name = "github.com/stretchr/testify" | |||
version = "1.2.2" | |||
|
|||
[[constraint]] | |||
name = "github.com/kubernetes-sigs/kube-batch" | |||
version = "v0.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use 0.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using using v0.4 of kube-batch, there is dependency conflict for k8s.io/api, which in kubeflow, v1.11.2 is used but in kube-batch 1.13.2 is used, that is why v0.3 is used for kube-batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardsliu , any plan to upgrade k8s version? 1.14 is going to be release, and upstream only support three latest versions (1.12, 1.13, 1.14) until LTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With v1beta2, the subresource status requires 1.13.
/ok-to-test |
pkg/util/kbutil/kbutil.go
Outdated
var kubeBatchClientSet kubebatchclient.Interface | ||
|
||
//SetKubeBatchClientSet sets kube-batch client set. | ||
func SetKubeBatchClientInterface(clientset kubebatchclient.Interface) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those helper func seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need kube-batch client interface to create and delete podGroup, but if we add it in TFController, then APIs will change, that is why, in util kube-batch util is created and we are setting and getting kube-batch client interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which APIs are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APIs for creation and deletion of PodGroup which has been implemented in kube-batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is a common feature across operators, you may keep client interface in https://github.com/kubeflow/tf-operator/blob/master/pkg/common/jobcontroller/jobcontroller.go#L80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this change might affect v1beta1 also, since APIs are same for both v1beta1 and v1beta2. Since we change JobController API used for creating new JobController will change, so to accomadate this change we have to change to change function calls in v1beta1 also. If that is okay, I can make those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thandayuthapani Let's keep the client interface in jobcontroller and make the necessary changes.
// Check whether podGroup exists or not | ||
podGroup, err := kubeBatchClientInterface.SchedulingV1alpha1().PodGroups(job.GetNamespace()).Get(job.GetName(), metav1.GetOptions{}) | ||
if err == nil || !k8serrors.IsNotFound(err) { | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing error check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have used same error check done for SyncPdb method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find this confusing to read. Why not just split into 2 if-blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if err == nil { | ||
err = errors.New(string(metav1.StatusReasonAlreadyExists)) | ||
} | ||
return podGroup, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to return error if it already exists? Isn't this will be called for every reconcile ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SyncPdb method it has been handled like this, so I have tried to mimic what SyncPdb was doing, but instead of PDB, have handled PodGroup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we missed this earlier. I don't see any need to return error. @richardsliu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Will change that.
c87d687
to
b411cd9
Compare
Travis tests have failedHey @thandayuthapani, 1st Buildgometalinter --config=linter_config.json --vendor ./...
goveralls -service=travis-ci -v -package ./pkg/... -ignore "pkg/client/*/*.go,pkg/client/*/*/*.go,pkg/client/*/*/*/*.go,pkg/client/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*/*.go,pkg/util/testutil/*.go,pkg/apis/tensorflow/*/zz_generated.*.go,pkg/apis/tensorflow/*/*_generated.go,pkg/apis/common/*/zz_generated.*.go,pkg/apis/common/*/*_generated.go"
2nd Buildgometalinter --config=linter_config.json --vendor ./...
goveralls -service=travis-ci -v -package ./pkg/... -ignore "pkg/client/*/*.go,pkg/client/*/*/*.go,pkg/client/*/*/*/*.go,pkg/client/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*/*.go,pkg/util/testutil/*.go,pkg/apis/tensorflow/*/zz_generated.*.go,pkg/apis/tensorflow/*/*_generated.go,pkg/apis/common/*/zz_generated.*.go,pkg/apis/common/*/*_generated.go"
TravisBuddy Request Identifier: 9ad29970-416b-11e9-b4f6-c1bea236eaae |
log.Infof("Deleting PodGroup %s", job.GetName()) | ||
|
||
//delete podGroup | ||
if err := kubeBatchClientInterface.SchedulingV1alpha1().PodGroups(job.GetNamespace()).Delete(job.GetName(), &metav1.DeleteOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: easier to read if this is split into 2 lines at the "if"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Check whether podGroup exists or not | ||
podGroup, err := kubeBatchClientInterface.SchedulingV1alpha1().PodGroups(job.GetNamespace()).Get(job.GetName(), metav1.GetOptions{}) | ||
if err == nil || !k8serrors.IsNotFound(err) { | ||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find this confusing to read. Why not just split into 2 if-blocks?
e87a4bd
to
7163918
Compare
/retest |
/retest |
Travis tests have failedHey @thandayuthapani, 1st Buildgometalinter --config=linter_config.json --vendor ./...
goveralls -service=travis-ci -v -package ./pkg/... -ignore "pkg/client/*/*.go,pkg/client/*/*/*.go,pkg/client/*/*/*/*.go,pkg/client/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*/*.go,pkg/util/testutil/*.go,pkg/apis/tensorflow/*/zz_generated.*.go,pkg/apis/tensorflow/*/*_generated.go,pkg/apis/common/*/zz_generated.*.go,pkg/apis/common/*/*_generated.go"
2nd Buildgometalinter --config=linter_config.json --vendor ./...
goveralls -service=travis-ci -v -package ./pkg/... -ignore "pkg/client/*/*.go,pkg/client/*/*/*.go,pkg/client/*/*/*/*.go,pkg/client/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*.go,pkg/client/*/*/*/*/*/*/*.go,pkg/util/testutil/*.go,pkg/apis/tensorflow/*/zz_generated.*.go,pkg/apis/tensorflow/*/*_generated.go,pkg/apis/common/*/zz_generated.*.go,pkg/apis/common/*/*_generated.go"
TravisBuddy Request Identifier: 54e83b60-4939-11e9-ab4c-bfa38af62fb3 |
/lgtm |
|
||
kubeBatchClientInterface := jc.KubeBatchClientSet | ||
// Check whether podGroup exists or not | ||
podGroup, err := kubeBatchClientInterface.SchedulingV1alpha1().PodGroups(job.GetNamespace()).Get(job.GetName(), metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: use jc.KubeBatchClientSet directly?
kubeBatchClientInterface := jc.KubeBatchClientSet | ||
|
||
//check whether podGroup exists or not | ||
_, err := kubeBatchClientInterface.SchedulingV1alpha1().PodGroups(job.GetNamespace()).Get(job.GetName(), metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: use jc.KubeBatchClientSet directly?
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
TFJob
yaml should contain annotation intfReplicaSpecs
with key asscheduling.k8s.io/group-name
and value as TFJob name.Which issue(s) this PR fixes
Partial Fix #916
This change is