Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tf-operator from the codebase #1378

Merged

Conversation

thunderboltsid
Copy link
Contributor

Remove tf-operator and update the docs as per
#1367

With training-operator.v1 in place, we no longer need tf-operator.v1
The tests were effectively testing the contract of the
testutil package so it makes sense the tests exist
within the package itself.
With TFReconciler in place ce can remove the TFController and
associated tests.
With tf_operator code deleted, we no longer need the dockerfile
to build an image to run tf_operator.
Substitute the binary from tf-operator to training-operator.
- Substitute references to tf-operator with training-operator
- Add instructions to apply all job CRDs instead of just TFJob as the
  operator expects them all to be present.
@aws-kf-ci-bot
Copy link
Contributor

Hi @thunderboltsid. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this @thunderboltsid!
I left few comments

docs/development/developer_guide.md Outdated Show resolved Hide resolved
pkg/controller.v1/tensorflow/tfjob_controller.go Outdated Show resolved Hide resolved
@@ -169,7 +169,7 @@ def build_operator_image(root_dir,
# List of paths to copy relative to root.
sources = [
"build/images/tf_operator/Dockerfile", "examples/tf_sample/tf_smoke.py",
os.path.join(go_path, bin_path, "tf-operator.v1"),
os.path.join(go_path, bin_path, "training-operator.v1"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need release.py file ? We probably need to check which files we are using from py/kubeflow/tf_operator.
cc @kubeflow/wg-training-leads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only release.py I can see referenced from the docs is https://github.com/kubeflow/tf-operator/blob/master/docs/release/release.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file doesn't use py/kubeflow/..., right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently not using this script anymore. I think it should be ok to remove tf-operator reference first. Another story is needed to clean up test/release tools

@andreyvelich
Copy link
Member

/ok-to-test

replace kubectl apply commands with makefile targets.
Metric counters have been refactored into pkg/common/metrics.go. This
diff removes the stale counters present in tfjobcontroller.
@johnugeorge
Copy link
Member

/cc @Jeffwan @gaocegege

@thunderboltsid
Copy link
Contributor Author

/retest

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the contribution!

/cc @Jeffwan

@Jeffwan
Copy link
Member

Jeffwan commented Aug 24, 2021

/hold

I will take some time to review.

docs/development/developer_guide.md Show resolved Hide resolved
pkg/common/util/v1/testutil/util_test.go Outdated Show resolved Hide resolved
pkg/controller.v1/tensorflow/util.go Show resolved Hide resolved
pkg/controller.v1/tensorflow/tfjob_controller.go Outdated Show resolved Hide resolved
As per review comments, we don't need to move
the util_tests.go in pkg/common/util/testutil/util_test.go
Add remark about the origin file for the helper methods that
were moved during the refactor in pkg/tensorflow.
@Jeffwan
Copy link
Member

Jeffwan commented Aug 25, 2021

Thanks for addressing the feedbacks. Overall looks good to me. Other reviewers? @andreyvelich @johnugeorge

/lgtm

delete variables no longer in use.
tfjob-operator is more consistent with the naming of other operators
such as pytorchjob-operator or xgboostjob-operator
@Jeffwan
Copy link
Member

Jeffwan commented Aug 26, 2021

/test kubeflow-tf-operator-presubmit

@andreyvelich
Copy link
Member

Thank you for this update!
/lgtm
/cc @Jeffwan @johnugeorge

@thunderboltsid
Copy link
Contributor Author

/retest

@johnugeorge
Copy link
Member

/test kubeflow-tf-operator-presubmit

- xgboostjob-operator -> xgboostjob-controller
- mxnet-operator -> mxjob-controller
- pytorchjob-operator -> pytorchjob-controller
- tfjob-operator -> tfjob-controller
Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks for your awesome contribution!

@Jeffwan
Copy link
Member

Jeffwan commented Aug 27, 2021

/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jasonliu747, Jeffwan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jasonliu747
Copy link
Member

/hold cancel

@google-oss-robot google-oss-robot merged commit ff5aaf1 into kubeflow:master Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants