-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default Intel MPI env variables to MPIJob #1804
Add default Intel MPI env variables to MPIJob #1804
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
/assign @tenzen-y |
Pull Request Test Coverage Report for Build 5002839939
💛 - Coveralls |
@tkatila Can you sign a CLA? Thanks for your contributions. |
Yes, I will. I just have to get confirmation from corp level on it first. |
@@ -1152,6 +1152,20 @@ func (jc *MPIJobReconciler) newLauncher(mpiJob *kubeflowv1.MPIJob, kubectlDelive | |||
}) | |||
} | |||
|
|||
// Add default Intel MPI bootstrap variables if none is provided by the user | |||
if !hasIntelMPIBootstrap(container.Env) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this implementation, if only I_MPI_HYDRA_BOOTSTRAP
is set (I_MPI_HYDRA_BOOTSTRAP_EXEC
is unset), it will skip this section.
Can we check both I_MPI_HYDRA_BOOTSTRAP
and I_MPI_HYDRA_BOOTSTRAP_EXEC
?
And then, should we set env vars if either is unset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this implementation, if only I_MPI_HYDRA_BOOTSTRAP is set (I_MPI_HYDRA_BOOTSTRAP_EXEC is unset), it will skip this section.
Can we check both I_MPI_HYDRA_BOOTSTRAP and I_MPI_HYDRA_BOOTSTRAP_EXEC?
hasIntelMPIBootstrap
checks for I_MPI_HYDRA_BOOTSTRAP prefix so it includes both ..BOOTSTRAP and ..BOOSTRAP_EXEC. Actually, it also applies to two other variables that I didn't take into account: https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-9/hydra-environment-variables.html#GUID-CF3DA5C7-8B4D-468F-B463-42DCA7E0E55E and https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-9/hydra-environment-variables.html#GUID-F0F7163C-3674-4AEC-A945-9D62BFA1CE68
And then, should we set env vars if either is unset?
Maybe it's better that way. I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasIntelMPIBootstrap checks for I_MPI_HYDRA_BOOTSTRAP prefix so it includes both ..BOOTSTRAP and ..BOOSTRAP_EXEC. Actually, it also applies to two other variables that I didn't take into account: https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-9/hydra-environment-variables.html#GUID-CF3DA5C7-8B4D-468F-B463-42DCA7E0E55E and https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-9/hydra-environment-variables.html#GUID-F0F7163C-3674-4AEC-A945-9D62BFA1CE68
Oh, it is great information :)
Mentioning which env vars (I_MPI_HYDRA_BOOTSTRAP
, I_MPI_HYDRA_BOOTSTRAP_EXEC
, I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS
, and I_MPI_HYDRA_BOOTSTRAP_AUTOFORK
) are considered in the comment would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a mention of the extra two variables to the comments. I also modified the code to set either of the missing variables.
fyi, still waiting on the CLA, should get it sorted during this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a mention of the extra two variables to the comments. I also modified the code to set either of the missing variables.
Thank you for the updates! I left a few comments.
fyi, still waiting on the CLA, should get it sorted during this week.
Thanks for letting me know!
f37df17
to
1ae623a
Compare
1ae623a
to
b5948f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my comments.
I left the last review comments for some nit.
b08dee8
to
8d43989
Compare
There were several mpi-operator IntelMPI support PRs in 2021, that this can be compared against: https://github.com/kubeflow/mpi-operator/pulls?q=is%3Apr+intel+mpi+is%3Aclosed Most important ones are:
Without them, user needs to manually specify slot counts on the MPIJob command lines. These could also relevant, but do not belong to this PR:
|
@eero-t Thank you for the great suggestion. However, we can discuss your proposals in another issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkatila Thank you for the great contribution!
This PR looks good to me.
Please let me know once you signed the CLA.
Done (see above). Most likely I won't be able to verify it when it eventually gets fixed as mpi-operator project seems to be as active as this project and already support all 3 major MPI variants, so I'll go with it. I just noticed those issues when reviewing k8s MPI implementation alternatives... |
Yes, the mpi-operator is an active project with some performance improvement and supports the queueing feature by Kueue. |
@tkatila ping |
pong. I'm really sorry for the delay. I hit some corporate scheninagans, but most of the things sorted themselves out this Friday. I'm hoping to sign the CLA next week. |
Thank you for letting me know! |
The CLA is now done. |
@tkatila Thanks! Welcome to Kubeflow Community 🎉 |
ping @johnugeorge |
Sorry. I missed this. @tkatila Can you do a rebase? |
Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
8d43989
to
55b13b6
Compare
Sure. Rebased. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: terrytangyuan, tkatila The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
When the container is equipped with Intel MPI, the end user has had to manually add Intel MPI specific env variables for the MPIJob. This modifies the operator to add the variables with default values, if there are no user defined variables.