Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip motorBike tutorial case in OpenFOAM sanity check if available cores < 6 #2615

Closed
wants to merge 3 commits into from

Conversation

LHurst-JM
Copy link

@LHurst-JM LHurst-JM commented Nov 5, 2021

The tutorial is hardcoded to use exactly 6 cores, so the sanity check will always fail if there are fewer cores than this available locally (even if there is no problem with the build). Problem occurs with both foss and intel toolchains (I've not tested any others).

This change only attempts to run that case as a test if there are 6 or more cores locally.

Closes easybuilders/easybuild-easyconfigs#14291

The tutorial is hardcoded to use exactly 6 cores, so the sanity check will always fail if there are fewer cores than this available locally (even if there is no problem with the build).

This change only attempts to run that case as a test if there are 6 or more cores locally.
@LHurst-JM LHurst-JM marked this pull request as ready for review November 5, 2021 12:50
@LHurst-JM LHurst-JM changed the title Skip the motorBike tutorial case if cores < 6 OpenFOAM - Skip the motorBike tutorial case if cores < 6 Nov 5, 2021
@boegel boegel added this to the next release (4.5.1?) milestone Nov 9, 2021
@boegel
Copy link
Member

boegel commented Nov 9, 2021

An alternative here is to allow oversubscription of MPI ranks on cores, right @branfosj?

@LHurst-JM
Copy link
Author

LHurst-JM commented Nov 10, 2021

@boegel yes, and that would be preferable however the actual mpirun command is burried inside a script within the OpenFOAM install and I didn't find an MPI-agnostic way to do it via enivironment variables (since this is in the easyblock, it needs to work for all the MPIs the easyconfig might be built with). Basically I couldn't make it work after a few hours struggling with it.

Edit: I think there's only a foss OpenFOAM 8 for 2020b but I have a local intel version (I haven't contributed it as it's a trivial foss -> intel change to 2 easyconfigs), so using openmpi-specific environment variables here will not fix this for me. There are existing intel configs for other versions.

@boegel boegel changed the title OpenFOAM - Skip the motorBike tutorial case if cores < 6 skip motorBike tutorial case in OpenFOAM sanity check if available cores < 6 Nov 10, 2021
@Mormacill
Copy link

Mormacill commented Nov 17, 2021

The runParallel function is defined in the RunFunctions script which is sourced at the beginning by

. $WM_PROJECT_DIR/bin/tools/RunFunctions

The function has some predefined options for mpirun but not for the oversubscription flag.

The key section says:

[...]
        echo "Running $APP_RUN in parallel on $PWD using $nProcs processes"
        if [ "$LOG_APPEND" = "true" ]; then
            ( mpirun -np $nProcs $APP_RUN -parallel "$@" < /dev/null >> log.$LOG_SUFFIX 2>&1 )
        else
            ( mpirun -np $nProcs $APP_RUN -parallel "$@" < /dev/null > log.$LOG_SUFFIX 2>&1 )
        fi
[...]

I think it is not recommended to patch these lines as people might use these functions later in production when oversubscription is not desired.

It might be possible to replace the line
runParallel simpleFoam
in the easyblock to the linked mpirun command, e.g.:
mpirun -np $(getNumberOfProcessors) $(getApplication) -parallel --oversubscribe < /dev/null >> log.$LOG_SUFFIX 2>&1

This is not the nice way from a style view but it keeps the sanity check and avoids a skip.

It could also be something possible by changing the entry for numberOfSubdomains in the decomposeParDict by the foamDictionary function to change the tutorial only using the available resources. As an advantage the runParallel function can persist because it obtains the number of cores used from the numberOfSubdomains keyword in decomposeParDict.
This could look like:
foamDictionary -entry "numberOfSubdomains" -set $(nproc) system/decomposeParDict
Problem here just is that motorBike uses hierarchical decomposition, the coefficients need to be adjusted too.
Or to change to scotch with the same command to avoid coefficients at all.

Although I am not sure about this method changing the dicts because it's an intervention to the original tutorial files which might be inappropriate.

EDIT:
I just noticed you already ruminated on that topic in the corresponding issue @LHurst-JM, sorry, didn't see that.

@LHurst-JM
Copy link
Author

LHurst-JM commented Nov 18, 2021

@Mormacill yes, there doesn't seem to be a clear easy solution to using this tutorial as a sanitycheck. I'm not familiar with using OpenFOAM, just building it - I don't suppose you are aware of a bundled tutorial case that would do as an adequate test case but needs 2 or 4 cores instead of 6? (Doesn't completely solve the problem but 4-cores is far more common at the "low end" of HPC systems, in my experience.)

@Mormacill
Copy link

@LHurst-JM There are a few cases which are using 4 cores, just one I found with 2 by default (heatTransfer/buoyantSimpleFoam/iglooWithFridges). The advantage of motorBike tutorial sanity check is that it covers a wide range of functions being tested.
It might be a solution to implement a fallback tutorial if host cores are lower than 6 instead of only skipping motorbike.

@LHurst-JM
Copy link
Author

@Mormacill thanks, I agree that sounds like a better plan. Not knowing OpenFOAM I find it hard to judge which cases are suitable.

@boegel
Copy link
Member

boegel commented Dec 7, 2021

@LHurst-JM Does the sanity check also fail when OpenFOAM is built on top of Intel MPI?

I know OpenMPI by default gives up when there are less cores available than "slots" (cores), but I don't remember running into that with Intel MPI?

@LHurst-JM
Copy link
Author

LHurst-JM commented Dec 8, 2021

Hi @boegel, I'm sure I tried intel and had the same problem but testing again now....

If it is just OpenMPI, I think we can tell it to oversubscribe with an environment variable which will make this a lot simpler.

@LHurst-JM
Copy link
Author

@boegel my bad, you are entirely correct - it does seem to build with the Intel toolchain, I must have assumed the issue would be common to all mpirun implementations. That does explain why I was struggling to find how to make Intel's mpirun oversubscribe - it must do it by default!

@boegel
Copy link
Member

boegel commented Dec 8, 2021

This change should no longer be needed if easybuilders/easybuild-framework#3909 gets merged...

@LHurst-JM
Copy link
Author

Looks good to me, I'll close this one. I think the general solution is much better than playing whack-a-mole with individual easyblocks.

@LHurst-JM LHurst-JM closed this Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenFOAM-8-2020b sanity check fails on nodes with <6 cores
3 participants