Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpiio tests fails on i686 with espresso-4.1.0 #3230

Closed
junghans opened this issue Oct 2, 2019 · 11 comments · Fixed by #3234
Closed

mpiio tests fails on i686 with espresso-4.1.0 #3230

junghans opened this issue Oct 2, 2019 · 11 comments · Fixed by #3234

Comments

@junghans
Copy link
Member

junghans commented Oct 2, 2019

130/135 Test #127: mpiio .........................................***Failed    1.83 sec
/usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator<int>; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 22870 RUNNING AT buildvm-12.phx2.fedoraproject.org
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Details here

@junghans
Copy link
Member Author

junghans commented Oct 2, 2019

Same on ppc64le and aarch64, see: https://koji.fedoraproject.org/koji/taskinfo?taskID=38009265

@mkuron
Copy link
Member

mkuron commented Oct 3, 2019

This looks like #2507. Are you sure that's fixed in the Boost packages on current Fedora? We need a full backtrace to diagnose this if you think it's something else.

@junghans
Copy link
Member Author

junghans commented Oct 3, 2019

The fix from #2507 is still in there: https://src.fedoraproject.org/rpms/boost/blob/master/f/boost.spec#_153

@jwakely, correct?

It seems to be a similar, but different bug.

@jwakely
Copy link

jwakely commented Oct 3, 2019

Right, the fix should still be in the Fedora package.

@mkuron
Copy link
Member

mkuron commented Oct 4, 2019

Is there a i386 Fedora docker image? I can't attach a debugger to the emulated ones (aarch64, ppc64le) and the x86_64 build does not exhibit the problem. I can't even find a Fedora 32 ISO image that I could install in a VM. Alternatively, is it possible to somehow get a backtrace straight from the build environment?

@jwakely
Copy link

jwakely commented Oct 4, 2019

On a Fedora x86_64 system you can use mock -r fedora-32-i386 --rebuild espresso-4.1.0-1.fc32.src.rpm to try and build it in a local i686 chroot.

I'm trying that now.

@jwakely
Copy link

jwakely commented Oct 4, 2019

That built OK and all tests passed. 😕

@junghans
Copy link
Member Author

junghans commented Oct 4, 2019

I was able to make it fail on my 64-bit Fedora 30 system running:

$ mock -r fedora-rawhide-i386 --dnf --init
....
$ mock -r fedora-rawhide-i386 --no-clean espresso-4.1.0-1.fc32.src.rpm
...
127/135 Test #127: mpiio .........................................***Failed    4.75 sec
/usr/include/c++/9/bits/stl_vector.h:1042: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = int; _Alloc = std::allocator<int>; std::vector<_Tp, _Alloc>::reference = int&; std::vector<_Tp, _Alloc>::size_type = unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.
[6ba873f087754236b000d9c1c275fe26:11373] *** Process received signal ***
[6ba873f087754236b000d9c1c275fe26:11373] Signal: Aborted (6)
[6ba873f087754236b000d9c1c275fe26:11373] Signal code:  (-6)
[6ba873f087754236b000d9c1c275fe26:11373] [ 0] linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xf7f91970]
[6ba873f087754236b000d9c1c275fe26:11373] [ 1] linux-gate.so.1(__kernel_vsyscall+0x9)[0xf7f91949]
[6ba873f087754236b000d9c1c275fe26:11373] [ 2] /lib/libc.so.6(gsignal+0xc6)[0xf7e0a196]
[6ba873f087754236b000d9c1c275fe26:11373] [ 3] /lib/libc.so.6(abort+0x130)[0xf7df23b7]
[6ba873f087754236b000d9c1c275fe26:11373] [ 4] /builddir/build/BUILDROOT/espresso-4.1.0-1.fc32.i386/usr/lib/python3.8/site-packages/openmpi/espressomd/EspressoScriptInterface.so(+0x1b3767)[0xf688b767]
[6ba873f087754236b000d9c1c275fe26:11373] [ 5] /builddir/build/BUILDROOT/espresso-4.1.0-1.fc32.i386/usr/lib/python3.8/site-packages/openmpi/espressomd/mpiio.so(_ZN5Mpiio21mpi_mpiio_common_readEPKcj+0x2297)[0xf667d1f7]
[6ba873f087754236b000d9c1c275fe26:11373] [ 6] /builddir/build/BUILDROOT/espresso-4.1.0-1.fc32.i386/usr/lib/python3.8/site-packages/openmpi/espressomd/EspressoScriptInterface.so(_ZN15ScriptInterface5MPIIO11MPIIOScript11call_methodERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt13unordered_mapIS7_N5boost7variantINSB_6detail7variant14recursive_flagINS_4NoneEEEJbidS7_St6vectorIiSaIiEESI_IdSaIdEEN5Utils8ObjectIdINS_19ScriptInterfaceBaseEEESI_INSB_18recursive_variant_ESaISR_EENSN_6VectorIdLj2EEENSU_IdLj3EEENSU_IdLj4EEEEEESt4hashIS7_ESt8equal_toIS7_ESaISt4pairIS8_SY_EEE+0x28a)[0xf693539a]
[6ba873f087754236b000d9c1c275fe26:11373] [ 7] /builddir/build/BUILDROOT/espresso-4.1.0-1.fc32.i386/usr/lib/python3.8/site-packages/openmpi/espressomd/EspressoScriptInterface.so(_ZN15ScriptInterface23ParallelScriptInterface11call_methodERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt13unordered_mapIS6_N5boost7variantINSA_6detail7variant14recursive_flagINS_4NoneEEEJbidS6_St6vectorIiSaIiEESH_IdSaIdEEN5Utils8ObjectIdINS_19ScriptInterfaceBaseEEESH_INSA_18recursive_variant_ESaISQ_EENSM_6VectorIdLj2EEENST_IdLj3EEENST_IdLj4EEEEEESt4hashIS6_ESt8equal_toIS6_ESaISt4pairIS7_SX_EEE+0x100)[0xf695c9e0]
[6ba873f087754236b000d9c1c275fe26:11373] [ 8] /builddir/build/BUILD/espresso/openmpi/src/python/espressomd/script_interface.so(+0x2c6ba)[0xf6a2c6ba]
[6ba873f087754236b000d9c1c275fe26:11373] [ 9] /builddir/build/BUILD/espresso/openmpi/src/python/espressomd/script_interface.so(+0x11b31)[0xf6a11b31]
[6ba873f087754236b000d9c1c275fe26:11373] [10] /builddir/build/BUILD/espresso/openmpi/src/python/espressomd/script_interface.so(+0x13368)[0xf6a13368]
[6ba873f087754236b000d9c1c275fe26:11373] [11] /lib/libpython3.8.so.1.0(_PyObject_MakeTpCall+0x23f)[0xf7b5c09f]
[6ba873f087754236b000d9c1c275fe26:11373] [12] /lib/libpython3.8.so.1.0(+0xa1088)[0xf7b03088]
[6ba873f087754236b000d9c1c275fe26:11373] [13] /lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x16ce)[0xf7bc1abe]
[6ba873f087754236b000d9c1c275fe26:11373] [14] /lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x2e8)[0xf7b5d718]
[6ba873f087754236b000d9c1c275fe26:11373] [15] /lib/libpython3.8.so.1.0(+0x13f7fa)[0xf7ba17fa]
[6ba873f087754236b000d9c1c275fe26:11373] [16] /lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x16ce)[0xf7bc1abe]
[6ba873f087754236b000d9c1c275fe26:11373] [17] /lib/libpython3.8.so.1.0(+0x13f707)[0xf7ba1707]
[6ba873f087754236b000d9c1c275fe26:11373] [18] /lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x79f)[0xf7bc0b8f]
[6ba873f087754236b000d9c1c275fe26:11373] [19] /lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x115)[0xf7b88765]
[6ba873f087754236b000d9c1c275fe26:11373] [20] /lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0xbbe)[0xf7bc0fae]
[6ba873f087754236b000d9c1c275fe26:11373] [21] /lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x2e8)[0xf7b5d718]
[6ba873f087754236b000d9c1c275fe26:11373] [22] /lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x239)[0xf7b88889]
[6ba873f087754236b000d9c1c275fe26:11373] [23] /lib/libpython3.8.so.1.0(+0x13f951)[0xf7ba1951]
[6ba873f087754236b000d9c1c275fe26:11373] [24] /lib/libpython3.8.so.1.0(PyVectorcall_Call+0x76)[0xf7b5fba6]
[6ba873f087754236b000d9c1c275fe26:11373] [25] /lib/libpython3.8.so.1.0(PyObject_Call+0x3a)[0xf7b5fcca]
[6ba873f087754236b000d9c1c275fe26:11373] [26] /lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x2810)[0xf7bc2c00]
[6ba873f087754236b000d9c1c275fe26:11373] [27] /lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x2e8)[0xf7b5d718]
[6ba873f087754236b000d9c1c275fe26:11373] [28] /lib/libpython3.8.so.1.0(_PyObject_FastCallDict+0x1e0)[0xf7b5e4b0]
[6ba873f087754236b000d9c1c275fe26:11373] [29] /lib/libpython3.8.so.1.0(_PyObject_Call_Prepend+0x71)[0xf7b66d51]
[6ba873f087754236b000d9c1c275fe26:11373] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node 6ba873f087754236b000d9c1c275fe26 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

Now, you can use mock -r fedora-rawhide-i386 --shell to debug.

@mkuron
Copy link
Member

mkuron commented Oct 4, 2019

Thanks, @junghans. So mpi_mpiio_common_read is here:

void mpi_mpiio_common_read(const char *filename, unsigned fields) {
. Looking at the code, a potential candidate is
std::copy_n(&bond[boff[i]], blen, bl.begin());
for the case of zero bonds. Since you already have a build environment set up, could you please prepend that line with an if(blen) and try again?

@junghans
Copy link
Member Author

junghans commented Oct 5, 2019

@mkuron
Copy link
Member

mkuron commented Oct 5, 2019

Wow, that was just a wild guess... I‘ll post a pull request later and make sure this fix ends up in 4.1.1.

bors bot added a commit that referenced this issue Oct 11, 2019
3234: Fix mpiio with stdlibc++ range checking r=fweik a=mkuron

Fixes #3230. Reported by @junghans.

When mpiio was used but no bonds were present, we would still try to copy zero bonds from a zero-length vector. This triggered an assertion when stdlibc++ range checking was enabled.

Please tag for cherry-picking into 4.1.1.

3236: ESS2019 installation guide updates r=KaiSzuttor a=mkuron

Lessons learned today:

- We require MPI 3 because we depend on const-correctness in a few places. That means that OpenMPI 1.6.5 and lower are not supported anymore.
- Installing the ROCm driver breaks access to /dev/kfd, causing hwloc initialization during `mpiexec` to hang. Rebooting helps.
- Add matplotlib, ipython and jupyter to the Mac install guide.
- Homebrew now defaults to Python 3, requires manually enabling cython, and it's unclear whether the hdf5 package still supports MPI (Homebrew/homebrew-core#26974)
- Anaconda (~/anaconda[23]) and python.org packages (/Library/Python and /usr/local/bin) are also sources of conflict

Please tag for the 4.1.1 release

3238: maintainer: Escape module python in wrapper script r=jngrad a=fweik

Fixes #3237.

Description of changes:
 - Added quotes around the module path in python wrapper script.


Co-authored-by: Michael Kuron <mkuron@users.noreply.github.com>
Co-authored-by: Michael Kuron <mkuron@icp.uni-stuttgart.de>
Co-authored-by: Kai Szuttor <kai@icp.uni-stuttgart.de>
Co-authored-by: Florian Weik <fweik@icp.uni-stuttgart.de>
@bors bors bot closed this as completed in c1b18fd Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants