Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

Difference between kubeflow/common and kubeflow/tf-operator/pkg/common #71

Closed
Jeffwan opened this issue Apr 14, 2020 · 6 comments · Fixed by #85
Closed

Difference between kubeflow/common and kubeflow/tf-operator/pkg/common #71

Jeffwan opened this issue Apr 14, 2020 · 6 comments · Fixed by #85
Assignees

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Apr 14, 2020

Comparing ControllerInterface between two different projects, tf-operator is pretty clean. kubeflow/common adds extra methods to create Job/Pod/Services.

GetPodsForJob(job interface{}) ([]*v1.Pod, error)
// GetServicesForJob returns the services managed by the job. This can be achieved by selecting services using label key "job-name"
// i.e. all services created by the job will come with label "job-name" = <this_job_name>
GetServicesForJob(job interface{}) ([]*v1.Service, error)
// DeleteJob deletes the job
DeleteJob(job interface{}) error
// UpdateJobStatus updates the job status and job conditions
UpdateJobStatus(job interface{}, replicas map[ReplicaType]*ReplicaSpec, jobStatus *JobStatus) error
// UpdateJobStatusInApiServer updates the job status in API server
UpdateJobStatusInApiServer(job interface{}, jobStatus *JobStatus) error
// CreateService creates the service
CreateService(job interface{}, service *v1.Service) error
// DeleteService deletes the service
DeleteService(job interface{}, name string, namespace string) error
// CreatePod creates the pod
CreatePod(job interface{}, pod *v1.Pod) error
// DeletePod deletes the pod
DeletePod(job interface{}, pod *v1.Pod) error

Besides that, we extracted a few interfaces for high level abstraction which makes sense.

SetClusterSpec(job interface{}, podTemplate *v1.PodTemplateSpec, rtype, index string) error
// Returns the default container name in pod
GetDefaultContainerName() string
// Get the default container port name
GetDefaultContainerPortName() string
// Get the default container port number
GetDefaultContainerPortNumber() int32
// Returns if this replica type with index specified is a master role.
// MasterRole pod will have "job-role=master" set in its label
IsMasterRole(replicas map[ReplicaType]*ReplicaSpec, rtype ReplicaType, index int) bool

JobController side, tf-operator has PodControl and ServiceControl which are used to operate k8s objects. However, in #36, since create/delete pods/services interface have been exposed, community determine to remove PodControl and ServiceControl because them are kubernetes controller internal codes, and there's also concern on the size of the dependency (need kubernetes code base).

Currently, it's a little bit weird.

  1. We can not fully get rid of kubernetes/pkg/controller, Expectation is still using them here. https://github.com/kubeflow/common/blob/master/pkg/controller.v1/common/job_controller.go#L100 . This is not part of client-go yet. See issue Adding expectations and controller ref manager to client-go kubernetes/client-go#332

I think we used dep to manage dependencies in the past and I've migrated common to go module compatible. Ideally, we will move all operator to go modules pretty soon. Size of the dependencies is not that painful.

  1. Files https://github.com/kubeflow/common/blob/master/pkg/controller.v1/common/pod_control.go and https://github.com/kubeflow/common/blob/master/pkg/controller.v1/common/service_control.go are not being used in kubeflow/common.

I think we only one of following three. The status in kubeflow/common is

  • Create Pods/Service Interface - exist and being used
  • controller. ServiceControlInterface - deleted
  • copied ServiceControlInterface - still there and but not being used. Only tests are using it.
  1. Implementation from different operators for following interface are very close. I doubt we need different controller to have similar logic there. Currently, only xgboost-operator is built on top of this operator, we have plan to migrate tf, pytorch, mxnet to follow kubeflow/common fashion. trying to see any other operators are built on top of common, if there's a need to customize logic of CreateService, etc. If that's all same, it's better to implement it inside JobController.

GetPodsForJob(job interface{}) ([]*v1.Pod, error)
// GetServicesForJob returns the services managed by the job. This can be achieved by selecting services using label key "job-name"
// i.e. all services created by the job will come with label "job-name" = <this_job_name>
GetServicesForJob(job interface{}) ([]*v1.Service, error)
// DeleteJob deletes the job
DeleteJob(job interface{}) error

CreateService(job interface{}, service *v1.Service) error
// DeleteService deletes the service
DeleteService(job interface{}, name string, namespace string) error
// CreatePod creates the pod
CreatePod(job interface{}, pod *v1.Pod) error
// DeletePod deletes the pod
DeletePod(job interface{}, pod *v1.Pod) error

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 14, 2020

@gaocegege
Copy link
Member

Thanks for the issue. It is helpful. Will we have plan to migrate tf-operator to common?

@terrytangyuan
Copy link
Member

@Jeffwan Thanks for tracking down the differences!

@gaocegege Yes, check out the current roadmap and we are tracking it in #64.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 15, 2020

I think my plan is to leave the interface for now.. I will have default implementation for following methods in JobController which is the our base controller in common. I will consider to extrac https://github.com/kubeflow/common/tree/master/pkg/control and try to delete kubernetes dependencies.

Then other controllers can reuse the implementation of JobController and they can still override following methods if they need a different implementation.

  • CreatePod
  • DeletePod
  • GetPodsForJob
  • CreateService
  • DeleteService
  • GetServicesForJob

@Jeffwan
Copy link
Member Author

Jeffwan commented May 16, 2020

Some conclusion

  1. We create a package in common for PodControl and ServiceControl. Pod/Services CRUD should use control package instead of exposing to end users
  2. Still expose GetPodsForService and GetServicesForJob as interfaces and common provides common implementation. User can override implementation based on the needs. I learnt this in XGBoost operator migration, the problem is projects built by Kubebuilder (like XGBoost Operator) doesn't use low level informer or lister. In order to leverage common implementation, user has to pass in PodLister and ServiceLister which is not elegant in kubebuilder implementation.

@Jeffwan Jeffwan self-assigned this May 16, 2020
@Jeffwan Jeffwan closed this as completed May 16, 2020
georgkaleido pushed a commit to georgkaleido/common that referenced this issue Jun 9, 2022
* Try and fix CircleCI build

* Add missing url to gemfury
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants