You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The IO unit tests use temporary files. Have you set one of the TMPDIR, TEMP or TMP environment variables to a directory that can be accessed by all nodes?
What happened?
test_io.test_save_csv
hangs when processes are distributed over several nodes. Works fine if all processes are on the same node.Code snippet triggering the error
# On HDFML salloc --account=haf --nodes=2 --time=00:30:00 --gres=gpu:4 ml GCC OpenMPI PyTorch torchvision mpi4py HDF5 netCDF srun -N 2 --ntasks-per-node=1 python -m unittest -vf heat/core/tests/test_io.py
Error message or erroneous outcome
Version
1.2.x
Python version
3.9
PyTorch version
1.11
The text was updated successfully, but these errors were encountered: