Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tests using MPI fail due to insufficient number of slots #725

Closed
keichi opened this issue Jul 17, 2018 · 6 comments
Closed

Some tests using MPI fail due to insufficient number of slots #725

keichi opened this issue Jul 17, 2018 · 6 comments
Assignees
Labels

Comments

@keichi
Copy link
Contributor

keichi commented Jul 17, 2018

These tests fail on my laptop with the recent version of ADIOS2:

The following tests FAILED:
	  1 - HeatTransfer.BPFile.Write.MxM (Failed)
	  2 - HeatTransfer.BPFile.Read.MxM (Failed)
	  3 - HeatTransfer.BPFile.Dump.MxM (Failed)
	  4 - HeatTransfer.BPFile.Validate.MxM (Failed)
	  5 - HeatTransfer.BPFile.Write.MxN (Failed)
	  6 - HeatTransfer.BPFile.Read.MxN (Failed)
	  7 - HeatTransfer.BPFile.Dump.MxN (Failed)
	  8 - HeatTransfer.BPFile.Validate.MxN (Failed)
	  9 - HeatTransfer.BPFile.Write.Mx1 (Failed)
	 10 - HeatTransfer.BPFile.Read.Mx1 (Failed)
	 11 - HeatTransfer.BPFile.Dump.Mx1 (Failed)
	 12 - HeatTransfer.BPFile.Validate.Mx1 (Failed)
	 13 - HeatTransfer.SST.MxM (Failed)
	 14 - HeatTransfer.SST.Dump.MxM (Failed)
	 15 - HeatTransfer.SST.Validate.MxM (Failed)
	 16 - HeatTransfer.SST.MxN (Failed)
	 17 - HeatTransfer.SST.Dump.MxN (Failed)
	 18 - HeatTransfer.SST.Validate.MxN (Failed)
	 19 - HeatTransfer.SST.Mx1 (Failed)
	 20 - HeatTransfer.SST.Dump.Mx1 (Failed)
	 21 - HeatTransfer.SST.Validate.Mx1 (Failed)
	 22 - HeatTransfer.InsituMPI.MxM (Failed)
	 23 - HeatTransfer.InsituMPI.Dump.MxM (Failed)
	 24 - HeatTransfer.InsituMPI.Validate.MxM (Failed)
	 25 - HeatTransfer.InsituMPI.MxN (Failed)
	 26 - HeatTransfer.InsituMPI.Dump.MxN (Failed)
	 27 - HeatTransfer.InsituMPI.Validate.MxN (Failed)
	 28 - HeatTransfer.InsituMPI.Mx1 (Failed)
	 29 - HeatTransfer.InsituMPI.Dump.Mx1 (Failed)
	 30 - HeatTransfer.InsituMPI.Validate.Mx1 (Failed)
	111 - ADIOSSstTest.3x5 (Failed)
	112 - ADIOSSstTest.3x5BP (Failed)
	113 - ADIOSSstTest.5x3 (Failed)
	118 - ADIOSSstTest.FtoC_3x5 (Failed)
	119 - ADIOSSstTest.FtoC_3x5BP (Failed)
	120 - ADIOSSstTest.CtoF_3x5 (Failed)
	121 - ADIOSSstTest.FtoF_3x5 (Failed)
	122 - ADIOSSstDelayedReaderTest.3x5 (Failed)
	123 - ADIOSSstDelayedReaderWithBlockingTest.3x5 (Failed)

Error message:

1: --------------------------------------------------------------------------
1: There are not enough slots available in the system to satisfy the 4 slots
1: that were requested by the application:
1:   /Users/keichi/Projects/research/ADIOS2/build/bin/heatTransfer_write_adios2
1:
1: Either request fewer slots for your application, or make more slots available
1: for use.
1: --------------------------------------------------------------------------

A workarounds that I found is to edit the cmake file (e.g. examples/heatTransfer/TestBPFileMx1.cmake) and reduce the number of requested MPI processes or add the --oversubscribe flag.

  • ADIOS2 4c6e11f
  • MPI: Open MPI 3.1.1
  • OS: macOS 10.13.4 (17E202)
  • CPU: 2.3 GHz Intel Core i5 (2 cores, 4 threads)

Thanks.

@williamfgc
Copy link
Contributor

@chuckatkins @keichi found a valid bug. Can CMake check the number of cores in a system? Then assign 2 if 2 and 4 for a number larger than 4? Thanks!

@williamfgc williamfgc added the bug label Jul 17, 2018
williamfgc added a commit to williamfgc/ADIOS2 that referenced this issue Sep 27, 2018
Removed oversubscribe parameter in tests
Investing more in C bindings to avoid ABI conflicts
Code Coverage should improve 
Solve ornladios#725 and ornladios#879
Heat transfer test won't run if physical cores is less than 4
@williamfgc
Copy link
Contributor

williamfgc commented Sep 27, 2018

@keichi I added a workaround using cmake and getting the physical cores info. Heat transfer only runs if 4 physical cores (not 2) are available. --oversubscribe killed our nightlies using mpich. Please check the latest master.

@keichi
Copy link
Contributor Author

keichi commented Sep 27, 2018

@williamfgc Thanks, the tests pass indeed, but I still would like to run the heat transfer tests.

I found that setting the OMPI_MCA_rmaps_base_oversubscribe environment variable to 1 results in the same behavior as using the --oversubscribe flag. This is non-intrusive and should work with MPICH. How about this?

@williamfgc
Copy link
Contributor

@keichi can this be set from adios2's cmake scripts? The only way it worked on my Mac is if I explicitly set export OMPI_MCA_rmaps_base_oversubscribe="yes" before compilation.

@keichi
Copy link
Contributor Author

keichi commented Oct 15, 2018

@williamfgc Setting it from the cmake scripts should work, since OMPI_MCA_rmaps_base_oversubscribe=1 ctest works for me.

williamfgc added a commit to williamfgc/ADIOS2 that referenced this issue Oct 17, 2018
OMPI_MCA_rmaps_base_oversubscribe=yes must be passed to cmake or be an
env variable
@williamfgc
Copy link
Contributor

williamfgc commented Oct 17, 2018

PR #938 works on my Mac if I set the environment variable export OMPI_MCA_rmaps_base_oversubscribe=yes or if I pass it to cmake and ctest. OMPI_MCA_rmaps_base_oversubscribe=yes cmake and OMPI_MCA_rmaps_base_oversubscribe=yes ctest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants