Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slots to hostfile #523

Merged
merged 1 commit into from
Feb 10, 2023
Merged

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Feb 9, 2023

Signed-off-by: Yuki Iwai yuki.iwai.tz@gmail.com

I added slots to the hostfile for the Horovod.

ref: https://horovod.readthedocs.io/en/stable/running_include.html#run-horovod

Fixes: #445

@google-oss-prow google-oss-prow bot requested review from carmark and zw0610 February 9, 2023 16:24
@tenzen-y tenzen-y changed the title Add slots to hostfile [WIP] Add slots to hostfile Feb 9, 2023
@tenzen-y tenzen-y changed the title [WIP] Add slots to hostfile Add slots to hostfile Feb 9, 2023
@tenzen-y tenzen-y changed the title Add slots to hostfile WIP: Add slots to hostfile Feb 9, 2023
Comment on lines 1190 to 1195
if mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationOpenMPI {
buffer.WriteString(fmt.Sprintf("%s%s-%d.%s.%s.svc slots=%d\n", mpiJob.Name, workerSuffix, i, workersService, mpiJob.Namespace, slots))
} else if mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationIntel {
buffer.WriteString(fmt.Sprintf("%s%s-%d.%s.%s.svc:%d\n", mpiJob.Name, workerSuffix, i, workersService, mpiJob.Namespace, slots))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y tenzen-y changed the title WIP: Add slots to hostfile Add slots to hostfile Feb 9, 2023
@tenzen-y
Copy link
Member Author

tenzen-y commented Feb 9, 2023

/assign @alculquicondor

Copy link
Collaborator

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

for i := 0; i < int(workerReplicas); i++ {
buffer.WriteString(fmt.Sprintf("%s%s-%d.%s.%s.svc\n", mpiJob.Name, workerSuffix, i, workersService, mpiJob.Namespace))
if mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationOpenMPI {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use a switch statement for behavior that varies per enums.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
@tenzen-y
Copy link
Member Author

@alculquicondor I have addressed your comments and squashed into one. PTAL.

@alculquicondor
Copy link
Collaborator

/lgtm

@sheevy sheevy mentioned this pull request Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elastic resize broken in v2 operator: Horovod requires the slots parameter
2 participants