You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did search for an existing issue for this problem and didn't find one, apologies if I overlooked it.
During the sanity check step, following successful build, on a 4-core (HTC) node eb -Tr OpenFOAM-8-{foss,intel}-2020b.eb dies with:
== 2021-11-04 22:57:07,360 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/sources/easybuild-framework/easybuild/base/exceptions.py:124 in __init__): Sanity check failed: sanity check command cp -a /mnt/apps/easybuild/2020b-openfoam/EL-7-has/software/OpenFOAM/8-foss-2020b/OpenFOAM-8/tutorials/incompressible/simpleFoam/motorBike /tmp/eb-vo57ox3l/tmpbazzqoko && cd /tmp/eb-vo57ox3l/tmpbazzqoko/motorBike && source $FOAM_BASH && . $WM_PROJECT_DIR/bin/tools/RunFunctions && cp $FOAM_TUTORIALS/resources/geometry/motorBike.obj.gz constant/triSurface/ && runApplication surfaceFeatures && runApplication blockMesh && runApplication decomposePar -copyZero && runParallel snappyHexMesh -overwrite && runParallel patchSummary && runParallel potentialFoam && runParallel simpleFoam && runApplication reconstructParMesh -constant && runApplication reconstructPar -latestTime && cd /mnt/apps/easybuild/2020b-openfoam/EL-7-has/software/OpenFOAM/8-foss-2020b && rm -r /tmp/eb-vo57ox3l/tmpbazzqoko exited with code 1 (output: Running surfaceFeatures on /tmp/eb-vo57ox3l/tmpbazzqoko/motorBike
Running blockMesh on /tmp/eb-vo57ox3l/tmpbazzqoko/motorBike
Running decomposePar on /tmp/eb-vo57ox3l/tmpbazzqoko/motorBike
Running snappyHexMesh in parallel on /tmp/eb-vo57ox3l/tmpbazzqoko/motorBike using 6 processes
) (at easybuild/sources/easybuild-framework/easybuild/framework/easyblock.py:3120 in _sanity_check_step)
== 2021-11-04 22:57:07,360 build_log.py:265 INFO ... (took 25 secs)
And this message was in /tmp/eb-vo57ox3l/tmpbazzqoko/motorBike/log.snappyHexMesh:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 6
slots that were requested by the application:
snappyHexMesh
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
It built successfully using the same setup on 44, 60 and 120-core (HPC) nodes. The commands to run the test are hard-coded in the openfoam easyblock for all versions > 6 and < 100. I cannot see an obvious way to override or disable the check, or to add extra options to the mpirun command (e.g. --oversubscribe) which would allow the test to run.
The number of processes required is read from the numberOfSubdomains in the file system/decomposeParDict from the tutorial, which is fixed in the OpenFOAM source. Arguably this is an upstream issue since this tutorial case requires precisely 6 cores but eb requiring it to run successfully, as a sanity check, is causing the build failure.
The simplest fix within EasyBuild will be to disable the test on nodes with fewer than 6 cores (perhaps by testing os.cpu_count() in the easyblock?). The mpirun command is set in the OpenFOAM scripts being used to run the example (bin/tools/RunFunctions), and I do not see an obvious way to add extra command-line options, so adding --overscribe that would enable the test to be conducted on low-core-count systems is non-trivial.
The text was updated successfully, but these errors were encountered:
I did search for an existing issue for this problem and didn't find one, apologies if I overlooked it.
During the sanity check step, following successful build, on a 4-core (HTC) node
eb -Tr OpenFOAM-8-{foss,intel}-2020b.eb
dies with:And this message was in
/tmp/eb-vo57ox3l/tmpbazzqoko/motorBike/log.snappyHexMesh
:It built successfully using the same setup on 44, 60 and 120-core (HPC) nodes. The commands to run the test are hard-coded in the openfoam easyblock for all versions > 6 and < 100. I cannot see an obvious way to override or disable the check, or to add extra options to the mpirun command (e.g.
--oversubscribe
) which would allow the test to run.The number of processes required is read from the
numberOfSubdomains
in the filesystem/decomposeParDict
from the tutorial, which is fixed in the OpenFOAM source. Arguably this is an upstream issue since this tutorial case requires precisely 6 cores but eb requiring it to run successfully, as a sanity check, is causing the build failure.The simplest fix within EasyBuild will be to disable the test on nodes with fewer than 6 cores (perhaps by testing
os.cpu_count()
in the easyblock?). The mpirun command is set in the OpenFOAM scripts being used to run the example (bin/tools/RunFunctions
), and I do not see an obvious way to add extra command-line options, so adding--overscribe
that would enable the test to be conducted on low-core-count systems is non-trivial.The text was updated successfully, but these errors were encountered: