Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect SchedulingPolicy #518

Closed
Tracked by #507
tenzen-y opened this issue Feb 5, 2023 · 1 comment · Fixed by #520
Closed
Tracked by #507

Respect SchedulingPolicy #518

tenzen-y opened this issue Feb 5, 2023 · 1 comment · Fixed by #520
Assignees

Comments

@tenzen-y
Copy link
Member

tenzen-y commented Feb 5, 2023

/kind feature

Currently, the mpi-operator does not support the Runpolicy.SchedulingPolicy. For example, we don't respect the SchedulingPolicy.MinAvailable when we create the PodGroup in the following:

// newPodGroup creates a new PodGroup for an MPIJob
// resource. It also sets the appropriate OwnerReferences on the resource so
// handleObject can discover the MPIJob resource that 'owns' it.
func newPodGroup(mpiJob *kubeflow.MPIJob, minAvailableReplicas int32) *podgroupv1beta1.PodGroup {
var pName string
if l := mpiJob.Spec.MPIReplicaSpecs[kubeflow.MPIReplicaTypeLauncher]; l != nil {
pName = l.Template.Spec.PriorityClassName
if w := mpiJob.Spec.MPIReplicaSpecs[kubeflow.MPIReplicaTypeWorker]; pName == "" && w != nil {
pName = w.Template.Spec.PriorityClassName
}
}
return &podgroupv1beta1.PodGroup{
ObjectMeta: metav1.ObjectMeta{
Name: mpiJob.Name,
Namespace: mpiJob.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(mpiJob, kubeflow.SchemeGroupVersionKind),
},
},
Spec: podgroupv1beta1.PodGroupSpec{
MinMember: minAvailableReplicas,
Queue: mpiJob.Annotations[podgroupv1beta1.QueueNameAnnotationKey],
PriorityClassName: pName,
},
}
}

Also, supporting the SchedulingPolicy, we can set the various PodGroup parameters for the coscheduling-plugin.

// PodGroupSpec represents the template of a pod group.
type PodGroupSpec struct {
	// MinMember defines the minimal number of members/tasks to run the pod group;
	// if there's not enough resources to start all tasks, the scheduler
	// will not start anyone.
	MinMember int32 `json:"minMember,omitempty"`

	// MinResources defines the minimal resource of members/tasks to run the pod group;
	// if there's not enough resources to start all tasks, the scheduler
	// will not start anyone.
	MinResources v1.ResourceList `json:"minResources,omitempty"`

	// ScheduleTimeoutSeconds defines the maximal time of members/tasks to wait before run the pod group;
	ScheduleTimeoutSeconds *int32 `json:"scheduleTimeoutSeconds,omitempty"`
}

https://github.com/kubernetes-sigs/scheduler-plugins/blob/f996e5caf6c77d521d574186dca793e351c45413/apis/scheduling/v1alpha1/types.go#L139-L153

@tenzen-y
Copy link
Member Author

tenzen-y commented Feb 5, 2023

/assign
Blocking #500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant