-
Notifications
You must be signed in to change notification settings - Fork 71
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just some nitpicks.
/lgtm
job_controller/job_controller.go
Outdated
// e.g. 15s, 30s, 60s, 120s... | ||
ReconcilerSyncLoopPeriod metav1.Duration | ||
|
||
// Enable gang scheduling by kube-arbitrator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-arbitrator -> kube-batch
job_controller/pod.go
Outdated
|
||
} | ||
|
||
// When a pod is updated, figure out what tfjob/s manage it and wake them up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tfjob can be changed to be more generic name here
job_controller/pod.go
Outdated
logger := commonutil.LoggerForPod(pod, jc.Controller.GetAPIGroupVersionKind().Kind) | ||
|
||
if job == nil { | ||
// If this is a TFJob pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFJob -> more generic name
test_util/v1/util.go
Outdated
} | ||
|
||
// ConvertTFJobToUnstructured uses JSON to convert TFJob to Unstructured. | ||
func ConvertTFJobToUnstructured(testJob *testjobv1.TestJob) (*unstructured.Unstructured, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function required?
test_job/v1/types.go
Outdated
|
||
// +genclient | ||
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object | ||
// +resource:path=tfjob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TfJOb -> job in this file
test_util/v1/const.go
Outdated
|
||
// DefaultPortName is name of the port used to communicate between PS and | ||
// workers. | ||
DefaultPortName = "tfjob-port" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFjob -> job in this file
Fixed the comments, please take a look again. |
/unassign @gaocegege |
/assign @gaocegege |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Co-authored-by: Paul Angerer <dabauxi@users.noreply.github.com>
Part 2 of kubeflow/training-operator#960
This is the common library functions copied from tf-operator. Mapping is as follows:
Also added a test_job to represent generic training jobs. Originally all the tests were written using tfjob, but this repo cannot take a dependency on tf-operator.
The other files are auto-generated for the testjob custom resource.
This change is