Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulling latest changes from kubeflow/tf-operator to deepak-muley/tf-operator #1

Merged
merged 3 commits into from
Aug 13, 2021

Conversation

deepak-muley
Copy link
Owner

No description provided.

Jeffwan and others added 3 commits August 12, 2021 20:28
1. Change release process
2. Checking python tool to generate changelog
3. Fix v1.2.0 changlog
* 1322: Modified manifests to use all-in-one training-operator WIP

Actions taken:
    - replaced tf-job-operator => training-operator
    - replaced kubeflow-tfjobs- => kubeflow-training-
    - moved crds for mxjobs, tgjobs, pytorchjobs and xgboostjobs from
      config/crd/bases to manifests/base/ and prefixed them with crd_
Ref: #1322
Testing steps: To be added
Work in Progress

* 1322: synced up config/manager with manifests

Training operator was found to be working
<pre>
k -n kubeflow logs -f training-operator-694766989-pp2j4
I0812 21:43:24.739862       1 request.go:645] Throttling request took 1.048945631s, request: GET:https://172.19.0.1:443/apis/networking.k8s.io/v1?timeout=32s
2021-08-12T21:43:25.694Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
2021-08-12T21:43:25.790Z	INFO	setup	starting manager
2021-08-12T21:43:25.790Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
2021-08-12T21:43:25.790Z	INFO	controller-runtime.manager.controller.tf-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:25.790Z	INFO	controller-runtime.manager.controller.mxnet-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:25.791Z	INFO	controller-runtime.manager.controller.pytorchjob-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:25.791Z	INFO	controller-runtime.manager.controller.xgboostjob-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.289Z	INFO	controller-runtime.manager.controller.xgboostjob-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.294Z	INFO	controller-runtime.manager.controller.pytorchjob-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.589Z	INFO	controller-runtime.manager.controller.mxnet-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.688Z	INFO	controller-runtime.manager.controller.tf-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.889Z	INFO	controller-runtime.manager.controller.tf-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.889Z	INFO	controller-runtime.manager.controller.pytorchjob-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.890Z	INFO	controller-runtime.manager.controller.xgboostjob-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.890Z	INFO	controller-runtime.manager.controller.mxnet-operator	Starting EventSource	{"source": "kind source: /, Kind="}
2021-08-12T21:43:26.990Z	INFO	controller-runtime.manager.controller.xgboostjob-operator	Starting Controller
2021-08-12T21:43:26.990Z	INFO	controller-runtime.manager.controller.tf-operator	Starting Controller
2021-08-12T21:43:26.990Z	INFO	controller-runtime.manager.controller.tf-operator	Starting workers	{"worker count": 1}
2021-08-12T21:43:26.990Z	INFO	controller-runtime.manager.controller.pytorchjob-operator	Starting Controller
2021-08-12T21:43:26.991Z	INFO	controller-runtime.manager.controller.xgboostjob-operator	Starting workers	{"worker count": 1}
2021-08-12T21:43:26.991Z	INFO	controller-runtime.manager.controller.pytorchjob-operator	Starting workers	{"worker count": 1}
2021-08-12T21:43:26.991Z	INFO	controller-runtime.manager.controller.mxnet-operator	Starting Controller
2021-08-12T21:43:26.991Z	INFO	controller-runtime.manager.controller.mxnet-operator	Starting workers	{"worker count": 1}
</pre>

* 1322: incorporated review comments - added all resources in ClusterRole

* 1322: incorporated review comments

- now controller-gen generates the crds directly in manifests/base
  instead of config/crd/bases
- updated setup-training-operator.sh to use manifests/overlays/standalone

* 1322: removed config/crd/bases as its now getting generated in manifests

* 1322: incorporated review comments related to using separate role files

* 1322: removed image name replacement
* Clean up manifests

* Remove prometheus and manager folder

* Update override image tag for kubeflow manifest

* Delete leader election role

* Remove non-exist tag in deployment
@deepak-muley deepak-muley merged commit a9ad979 into deepak-muley:master Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants