Skip to content

Commit

Permalink
Fix the success condition of the job in PyTorchJob's Elastic mode. (#…
Browse files Browse the repository at this point in the history
…1752)

Signed-off-by: Syulin7 <735122171@qq.com>
  • Loading branch information
Syulin7 authored Feb 8, 2023
1 parent aae672f commit c85040a
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion pkg/controller.v1/pytorch/pytorchjob_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,10 @@ func (r *PyTorchJobReconciler) UpdateJobStatus(job interface{},
} else {
if rtype == kubeflowv1.PyTorchJobReplicaTypeWorker {
// TODO(gaocegege): Support SuccessPolicy
if expected == 0 {
// Leave a succeeded condition for the following two cases:
// 1. If all workers are succeeded.
// 2. If `ElasticPolicy` is not nil and any worker has completed.
if expected == 0 || (pytorchjob.Spec.ElasticPolicy != nil && succeeded > 0) {
msg := fmt.Sprintf("PyTorchJob %s/%s successfully completed.",
pytorchjob.Namespace, pytorchjob.Name)
r.recorder.Event(pytorchjob, corev1.EventTypeNormal, commonutil.JobSucceededReason, msg)
Expand Down

0 comments on commit c85040a

Please sign in to comment.