-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External PMIx server support is broken #1425
Comments
P.S. if I negate the 0x8000 bit all works fine:
|
@artpol84 that is the right fix for me. |
@ggouaillardet Thank you, |
I think that the right solution would be to hash to |
right, |
I don't think it really matters, to be honest - SLURM doesn't support cross-job connections anyway, and so the jobid is totally arbitrary (we could just set it to '1', I suppose). One alternative we could use would be to have pmix_init check for an envar setting PMIX_JOBID, and then we could atoi that string. Would that help? |
SLURM may support multi job in future, so it'll be grate to consider that. |
We would need to add that to the list of things required to be provided in the environment, but that might be okay - basically, we want an int32_t that corresponds to the nspace string? Also has to be unique across the potential connection space, which could be tricky to guarantee...which is why we went to the string nspace anyway. I wonder if we should just worry about a temporary fix here? The plan is for orte to move to using the nspace instead of an int jobid anyway, so this may not be a longterm problem. |
Per the 2016-03-08 call, @rhc54 will fix this. He thinks it's a trivial fix (even if it's a short-term fix). |
@ggouaillardet @rhc54 @jladd-mlnx
The following commit:
b55b9e6
Brakes launch through the SLURM plugin the assert:
https://github.com/open-mpi/ompi/blob/master/ompi/proc/proc.h#L399
is triggered because jobid was created from hash function:
https://github.com/open-mpi/ompi/blob/master/opal/mca/pmix/pmix112/pmix1_client.c#L122
The text was updated successfully, but these errors were encountered: