We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Description A clear and concise description of the bug and the associated functionality.
MPI messages are compromised, when int16 tensors are communicated
To Reproduce Steps to reproduce the behavior:
Which module/class/function is affected? int16 tensors, mpi functions
What are the circumstances under which the bug appears? MPI communication with int16 arrays
What is the exact error-message/errorous behavious? erroneous received messages; error message on 4 elements (see example)
Expected behavior A clear and concise description of what you expected to happen. Getting same message
Illustrative If applicable, add screenshots or minimal examples to help explain your problem.
a = ht.array([[4,5,6,7],[6,7,8,9]], split=0, dtype=ht.int16) if a.comm.rank == 0: print(a.comm.rank, a) a.comm.Send(a, dest=1) elif a.comm.rank == 1: b = ht.empty((1,4), a.dtype) a.comm.Recv(b, source=0) print(a.comm.rank, b)
$ mpirun -n 2 python bug.py 0 tensor([[4, 5, 6, 7]], dtype=torch.int16) 1 tensor([[4, 0, 6, 7]], dtype=torch.int16) malloc_consolidate(): invalid chunk size [ZAM:10232] *** Process received signal *** [ZAM:10232] Signal: Aborted (6) [ZAM:10232] Signal code: (-6) [ZAM:10232] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f516f8e3f20] [ZAM:10232] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f516f8e3e97] [ZAM:10232] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f516f8e5801] [ZAM:10232] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7f516f92e897] [ZAM:10232] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7f516f93590a] [ZAM:10232] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x90bae)[0x7f516f935bae] [ZAM:10232] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0x947d8)[0x7f516f9397d8] [ZAM:10232] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x27d)[0x7f516f93c2ed] [ZAM:10232] [ 8] python[0x5ac0b5] [ZAM:10232] [ 9] python[0x56a894] [ZAM:10232] [10] python[0x56bee3] [ZAM:10232] [11] python(PyDict_SetItemString+0x153)[0x571633] [ZAM:10232] [12] python(PyImport_Cleanup+0x76)[0x4f3256] [ZAM:10232] [13] python(Py_FinalizeEx+0x5e)[0x6383ce] [ZAM:10232] [14] python(Py_Main+0x395)[0x639435] [ZAM:10232] [15] python(main+0xe0)[0x4b0f40] [ZAM:10232] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f516f8c6b97] [ZAM:10232] [17] python(_start+0x2a)[0x5b2fda] [ZAM:10232] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 0 on node ZAM exited on signal 6 (Aborted). --------------------------------------------------------------------------
Version Info Which version are you using? master
Additional comments Any other comments here.
The text was updated successfully, but these errors were encountered:
mtar
Successfully merging a pull request may close this issue.
Description
A clear and concise description of the bug and the associated functionality.
MPI messages are compromised, when int16 tensors are communicated
To Reproduce
Steps to reproduce the behavior:
Which module/class/function is affected?
int16 tensors, mpi functions
What are the circumstances under which the bug appears?
MPI communication with int16 arrays
What is the exact error-message/errorous behavious?
erroneous received messages; error message on 4 elements (see example)
Expected behavior
A clear and concise description of what you expected to happen.
Getting same message
Illustrative
If applicable, add screenshots or minimal examples to help explain your problem.
Version Info
Which version are you using?
master
Additional comments
Any other comments here.
The text was updated successfully, but these errors were encountered: