Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After setting hostNetwork to true, mpi does not work #1657

Closed
varinic opened this issue Sep 5, 2022 · 2 comments
Closed

After setting hostNetwork to true, mpi does not work #1657

varinic opened this issue Sep 5, 2022 · 2 comments

Comments

@varinic
Copy link

varinic commented Sep 5, 2022

I set hostNetwork to true for the mpi job, but the new ip is not used when the MPI is actually executed, resulting in no communication between the worker pods.
I found the mapping between the physical machine ip and the worker pod name in /opt/kube/hosts of the launcher pod, but there is no such mapping relationship in the worker pod. I doubt this is the reason why MPI is not using the new ip?
Can anybody help me?

@zw0610
Copy link
Member

zw0610 commented Sep 5, 2022 via email

@varinic
Copy link
Author

varinic commented Sep 5, 2022

The problem can be solved by removing "-mca pml ob1 -mca btl ^openib" parameter in the mpirun command.
Thanks!

@varinic varinic closed this as completed Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants