Per the discussion on the telecon, change the -host behavior yet again #1353

rhc54 · 2016-02-10T17:26:26Z

Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements

Ensure the returned exit status is non-zero if we fail to map

If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.

If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.

If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect.

With that one correction, this now passes all the given use-cases on that spreadsheet.

Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up.

Fixes #1344

jsquyres · 2016-02-11T12:11:07Z

@rhc54 I'm not sure the right things are happening.

I have 2 hosts, mpi009,mpi010 -- with 16 and 24 cores, respectively.

GOOD If I mpirun --host mpi009 hostname, I get a single hostname output.
GOOD If I mpirun --host mpi009 -np 10 hostname, I get 10 hostname outputs, and no complaints about oversubscription.
BAD If I mpirun --host mpi009 -np 50 hostname, I get 50 hostname outputs, and no complaints about oversubscription.
- Shouldn't this form set the num_slots value to be the num_cores hwloc detects on mpi009?
??? If I mpirun --host mpi009,mpi010 hostname, I get 40 hostname outputs.
- I guess we didn't talk about what happens if you specify multiple hosts to --host...?
- I think I was under the impression that if --host foo,bar,baz was specified (with no -np or --map-by args), we would get one process each on foo, bar, and baz...?
- FWIW, I think I see other users making the same assumption about 1ppn (Cisco QA just turned in a bug to me for this very case, actually). But I don't know if it's a universal assumption, nor do I remember offhand what we did earlier in v1.10.x.
- FWIW 2: This would be different behavior than a hostfile, though (e.g., with the same mpi009, mpi010 hosts in it). I just confirmed with a hostfile.txtx containing mpi009 and mpi010 (no slots specifications):
  - mpirun --hostfile hostfile.txt hostname runs 40 copies of hostname.
  - mpirun --hostfile hostfile.txt -np 2 hostname runs 2 copies of hostname on the first host (i.e., the default mapping is by-core)

jsquyres · 2016-02-11T16:00:01Z

The more I think about the ??? case from my prior comment, the more I think it should be GOOD. Specifically: it makes the --host behavior be like --hostfile behavior, and I think that's a Good Thing(tm).

That being said, it does make it weird that --host foo and --host foo,bar behavior is different (the former will launch 1 process, the latter will launch 2*num_cores processes). Is this what we intend?

rhc54 · 2016-02-11T23:48:41Z

@jsquyres and I discussed this on the phone and agreed that fixing the oversubscribed issue is the only change that is required.

rhc54 · 2016-02-12T14:45:47Z

@jsquyres Okay, I believe this now fits the agreed-upon behavior. Please verify and commit

jsquyres · 2016-02-12T22:51:09Z

Hmm. It's mostly what I expected:

$ mpirun -np --host mpi009 hostname
mpi009

# This is a 16 core machine
$ mpirun -np 16 --host mpi009 hostname
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009
mpi009

# There are 16 cores, so this should error
$ mpirun -np 17 --host mpi009 hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 17 slots
that were requested by the application:
  hostname

Either request fewer slots for your application, or make more slots available
for use.
$ echo $status
1

However, I thought we agreed that mpirun --host a,b a.out would run 2*num_cores procs, not 2 procs:

$ mpirun --host mpi009,mpi010 hostname
mpi009
mpi010
$ mpirun --host mpi009,mpi010,mpi011 hostname
mpi009
mpi010
mpi011

That's what it did in my test above, at least (run N*num_cores copies, not N copies). It's subjective, I suppose -- either way could be "correct". What did we do in previous versions?

rhc54 · 2016-02-12T23:00:42Z

I honestly don't remember - how about we declare it "good enough"? All these corner case conditions are already making the code pretty complex and I'd hate to keep adding to it.

rhc54 · 2016-02-13T04:25:08Z

@jsquyres okay, I surrender - updated to address your last comment.

jsquyres · 2016-02-13T13:03:16Z

Be aware that I made that last comment based on our last phone conversation where we discussed what OMPI did in prior versions. ...but I think both our memories of prior OMPI versions were faulty (which should probably be no surprise!).

I ran tests this morning going back to v1.6.0 to see what we've been doing with --host and --hostfile over the years. Here's a spreadsheet showing the behaviors.

It turns out that this PR is currently changing a lot of existing behavior -- probably much more than we want it to.

@rhc54 I know you're totally sick of this 😦 but would it be ok if we talk through the spreadsheet and come up with some minimal change compared to v1.10.2? Sorry!!

jsquyres · 2016-02-13T15:32:35Z

Oops -- the google spreadsheet is now publicly accessible.

rhc54 · 2016-02-13T16:00:35Z

thx - just to be clear: I am in no way proposing that this come over to the 1.10 series. This is a PR for master only. For 1.10, I'll just backport the change that set the exit status. I see no reason to change the behavior of -host in that series.

rhc54 · 2016-02-13T16:05:50Z

@jsquyres okay, i looked at the spreadsheet. I honestly don't have time to address all those proposed changes yet again. The logic is getting impenetrable to keep handling all the edge cases, and I'm out of time. So this isn't happening in the near future.

jsquyres · 2016-02-13T16:35:02Z

@rhc54 Ok, fair enough. Just to be clear, though, I was trying to say that it might be easier to ditch the PR, because it seems to have gone too far. The actual desired changes (compared to master and v2.x) are actually much smaller.

I added 2 columns at the end to show current git master and current v2.x behavior. These hopefully make it clearer that we're actually quite close to the desired behavior.

That being said:

Current master behaves like v1.10.2
Current v2.x behaves like v1.10.1

I'm mostly concerned about the fact that we're basically introducing new flip-flopping --host behavior from the user's perspective:

v1.10.1 behaves in manner A
v1.10.2 behaves in manner B
v2.0.0 behaves in manner A (like 1.10.1)
v3.0.0 behaves in manner B (like v1.10.2)

rhc54 · 2016-02-13T16:43:55Z

which is why I was sighing at the very beginning of this mess...frankly, this entire -host/-hostfile behavior thing has been a nightmare from the beginning as the community cannot seem to consistently agree on what it should do

jsquyres · 2016-02-16T16:33:55Z

We're going to discuss this at the Dallas developer meeting next week.

jsquyres · 2016-02-23T18:22:14Z

Per discussion in Dallas, for 2.0.0:

If mpirun CLI options require deeper node topology info (e.g., --npersocket), then ask PMIX for the node topology info. If PMIX cannot provide that info, launch the DVM to go fetch that info manually. Otherwise, no need to launch the DVM (i.e., no need to launch orteds on nodes where we will not be launching).
If user does not provide the number of processes (e.g., via -np, -npernode, ...etc.), mpirun errors. Exception: if sequential mapper is being used
New mpirun option (specific name TBD): --fillmeup that will automatically launch one process per slot.

rhc54 · 2016-02-25T16:13:09Z

@jsquyres I think this follows what we decided - please check.

rhc54 · 2016-02-25T16:17:34Z

@jsquyres NOTE: a custom patch will be required for v2.x - this change will not cleanly apply

jsquyres · 2016-03-07T21:54:43Z

It's very, very close. I only found one discrepancy:

mpirun --hostfile myhostfile -np 1000 hostname
The myhostfile lists N servers, without a slots clause on each.
In my test, -np 1000 is definitely an oversubscription. However, mpirun invokes 1000 copies of hostname -- it doesn't complain about oversubscription.

rhc54 · 2016-03-07T22:04:23Z

So I assume you have less than 1000 entries in that hostfile, none of which have specified slots, yes? And the totality of the number of cores across all entries is less than 1000?

jsquyres · 2016-03-07T22:28:54Z

Correct: I have 2 nodes listed in my hostfile, and slots is not specified for either of them. I expected that the total number of auto-detected slots (i.e., number of cores) should be 32 in this case. Hence, if I np of anything more than 32, I expected to get an oversubscription.

jsquyres · 2016-03-08T17:02:47Z

bot:retest

jsquyres · 2016-03-08T18:14:01Z

@rhc54 I'm sorry, but the behavior with these latest commits changed quite a bit compared to my tests yesterday afternoon. 😦

mpirun --host foo hostname: does not complain if you don't specify -np. It launches a single process.
mpirun --host foo,bar hostname: also does not complain if you don't specify -np. It launches 2 copies of hostname.

...I didn't test other combinations, because I think these changes mean that some unintended changes crept in...?

FYI: I'm going off column J in the google spreadsheet as what we decided in Dallas. Please correct me if that's wrong...

rhc54 · 2016-03-08T18:35:11Z

I quit - this will have to wait. I would suggest unblocking 2.0 for it.

hppritcha · 2016-03-08T18:46:57Z

Okay. We should document differences from 1.10. I'll open an issue to track.

rhc54 · 2016-03-26T19:31:59Z

I revised the PR description to more accurately reflect where we wound up:

Per the discussion on the telecon, change the -host behavior so we only run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements

Ensure the returned exit status is non-zero if we fail to map

If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported.

If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules.

If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row 7 on Jeff's spreadsheet is incorrect.

With that one correction, this now passes all the given use-cases on that spreadsheet.

jsquyres · 2016-03-29T16:14:26Z

Per conversation on the call today (29 Mar 2016), we updated column J a bit in the spreadsheet. The goal is to make --host and --hostfile behave exactly the same way as you would under a resource manager. Meaning:

If the number of slots are specified (via --host foo:N or --host foo,foo[,foo...] or in a hostfile via slots=N), then if -np is not specified, run exactly num_slots processes.
If -host foo is specified, or host is specified in a hostfile (with no slots=N qualifier), then we assume we don't know how many slots are intended, and -np is required

…ly run one instance if no slots were provided and the user didn't specify #procs to run. However, if no slots are given and the user does specify #procs, then let the number of slots default to the #found processing elements Ensure the returned exit status is non-zero if we fail to map If no -np is given, but either -host and/or -hostfile was given, then error out with a message telling the user that this combination is not supported. If -np is given, and -host is given with only one instance of each host, then default the #slots to the detected #pe's and enforce oversubscription rules. If -np is given, and -host is given with more than one instance of a given host, then set the #slots for that host to the number of times it was given and enforce oversubscription rules. Alternatively, the #slots can be specified via "-host foo:N". I therefore believe that row #7 on Jeff's spreadsheet is incorrect. With that one correction, this now passes all the given use-cases on that spreadsheet. Make things behave under unmanaged allocations more like their managed cousins - if the #slots is given, then no-np shall fill things up. Fixes #1344

rhc54 · 2016-04-04T17:30:31Z

Let's bring this in so folks can decide if they like it or not.

rhc54 assigned jsquyres Feb 10, 2016

rhc54 mentioned this pull request Feb 11, 2016

v1.10 and master: mpirun without enough slots returns $status==0 #1344

Closed

jsquyres added the pushed-back label Feb 11, 2016

rhc54 added bug and removed pushed-back labels Feb 12, 2016

jsquyres mentioned this pull request Feb 16, 2016

Transfer across the -host number of slots open-mpi/ompi-release#953

Merged

rhc54 added the Severity: blocker label Feb 25, 2016

rhc54 added this to the v2.0.0 milestone Feb 25, 2016

rhc54 merged commit a95de6e into open-mpi:master Apr 4, 2016

rhc54 deleted the topic/host branch April 4, 2016 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per the discussion on the telecon, change the -host behavior yet again #1353

Per the discussion on the telecon, change the -host behavior yet again #1353

rhc54 commented Feb 10, 2016

jsquyres commented Feb 11, 2016

jsquyres commented Feb 11, 2016

rhc54 commented Feb 11, 2016

rhc54 commented Feb 12, 2016

jsquyres commented Feb 12, 2016

rhc54 commented Feb 12, 2016

rhc54 commented Feb 13, 2016

jsquyres commented Feb 13, 2016

jsquyres commented Feb 13, 2016

rhc54 commented Feb 13, 2016

rhc54 commented Feb 13, 2016

jsquyres commented Feb 13, 2016

rhc54 commented Feb 13, 2016

jsquyres commented Feb 16, 2016

jsquyres commented Feb 23, 2016

rhc54 commented Feb 25, 2016

rhc54 commented Feb 25, 2016

jsquyres commented Mar 7, 2016

rhc54 commented Mar 7, 2016

jsquyres commented Mar 7, 2016

jsquyres commented Mar 8, 2016

jsquyres commented Mar 8, 2016

rhc54 commented Mar 8, 2016

hppritcha commented Mar 8, 2016

rhc54 commented Mar 26, 2016

jsquyres commented Mar 29, 2016

rhc54 commented Apr 4, 2016

Per the discussion on the telecon, change the -host behavior yet again #1353

Per the discussion on the telecon, change the -host behavior yet again #1353

Conversation

rhc54 commented Feb 10, 2016

jsquyres commented Feb 11, 2016

jsquyres commented Feb 11, 2016

rhc54 commented Feb 11, 2016

rhc54 commented Feb 12, 2016

jsquyres commented Feb 12, 2016

rhc54 commented Feb 12, 2016

rhc54 commented Feb 13, 2016

jsquyres commented Feb 13, 2016

jsquyres commented Feb 13, 2016

rhc54 commented Feb 13, 2016

rhc54 commented Feb 13, 2016

jsquyres commented Feb 13, 2016

rhc54 commented Feb 13, 2016

jsquyres commented Feb 16, 2016

jsquyres commented Feb 23, 2016

rhc54 commented Feb 25, 2016

rhc54 commented Feb 25, 2016

jsquyres commented Mar 7, 2016

rhc54 commented Mar 7, 2016

jsquyres commented Mar 7, 2016

jsquyres commented Mar 8, 2016

jsquyres commented Mar 8, 2016

rhc54 commented Mar 8, 2016

hppritcha commented Mar 8, 2016

rhc54 commented Mar 26, 2016

jsquyres commented Mar 29, 2016

rhc54 commented Apr 4, 2016