You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea was already mentioned in #52 but there a complete redesign of the MPI launcher was discussed.
My feature request is shorter and more goal-oriented.
I would suggest adding a bash script that simply launches an MPI job in syncro mode in a multi-CPU environment.
Maybe the code in quimb-mpi-slurm has to be changed slightly.
Describe the solution you'd like
I've already tested a lot but still had no success.
What actually works is running a job in non-MPI mode on one CPU with 48 threads.
The SLURM batch script looks then like this:
#!/bin/bash# Number of nodes to allocate#SBATCH --nodes=1# Number of MPI instances (ranks) to be executed per node#SBATCH --ntasks-per-node=1# Number of threads per MPI instance#SBATCH --cpus-per-task=48# Allocate 8 GB memory per node#SBATCH --mem=8gb# Maximum run time of job#SBATCH --time=24:00:00# Give job a reasonable name#SBATCH --job-name=mps_for_plots# File name for standard output (%j will be replaced by job id)#SBATCH --output=mps_for_plots-%j.out# File name for error output#SBATCH --error=mps_for_plots-%j.err#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}export HOME=~
module load compiler/intel/19.1.2
module load mpi/impi
module load devel/valgrind
module load numlib/mkl/2020.2
srun $(ws_find conda)/conda/envs/quimbPet/bin/python ~/MasterThesis/012-facilitationWithPhonons/mpsPhonons.py
But instead I want to execute the job not only on one CPU, but on different CPUs (nodes).
Describe alternatives you've considered
My script looks as follows atm:
Also, the commented-out lines at the end were not successful, where I already tried to change the start code to quimb-mpi-slurm.
#!/bin/bash# Number of nodes to allocate#SBATCH --nodes=1# Number of MPI instances (ranks) to be executed per node# #SBATCH --ntasks=1#SBATCH --ntasks-per-node=1# Number of threads per MPI instance#SBATCH --cpus-per-task=2# Allocate ... memory per node# #SBATCH --mem-per-cpu=8gb#SBATCH --mem=1gb# Maximum run time of job#SBATCH --time=02:00:00# Give job a reasonable name#SBATCH --job-name=phonon_mps# File name for standard output (%j will be replaced by job id)#SBATCH --output=phonon_mps-%j.out# File name for error output#SBATCH --error=phonon_mps-%j.err# Send status mails to user### SBATCH --mail-type=ALL### SBATCH --mail-user=chris.nill@student.uni-tuebingen.de#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}# clean up all modules
module purge
module load compiler/intel
module load mpi/impi
#module load compiler/gnu#module load mpi/openmpi
module load devel/valgrind
module load numlib/mkl/2020.2
# activate condasource$( ws_find conda )/conda/etc/profile.d/conda.sh
conda activate quimbPet
srun $(ws_find conda)/conda/envs/quimbPet/bin/python ~/mpsPhonons.py
# or:# $(ws_find conda)/conda/envs/quimbPet/bin/quimb-mpi-slurm -l "srun --mpi=pmix_v3" --np ${SLURM_NPROCS} --syncro ~/mpsPhonons.py# $(ws_find conda)/conda/envs/quimbPet/bin/quimb-mpi-python -l "mpiexec" --syncro ~/mpsPhonons.py
Code of quimb-mpi-slurm:
#!/bin/bashPOSITIONAL=()
while [[ $# -gt 0 ]]dokey="$1"case $keyin-h|--help)
echo "Runapythonscriptthatusesquimb, eagerlylaunchingwithmpi, ratherthandynamicallyspawningMPIprocesses.
Usage:
quimb-mpi-python [OPTIONS]... [SCRIPT]...
Options:
-n, --np<NUM_PROCS>Howmanympiprocessestouse, defaultstolettingtheMPIlauncherdecide.
-l, --launcher<MPI_LAUNCHER>Howtolaunchthepythonprocess, defaultsto'mpiexec'. Canaddmpioptionshere.
-s, --syncroLaunchinsyncromode, whereallprocessesrunthescript, splittingworkuponlywhenaMPIPoolisencountered.
-h, --helpShowthishelp.
Notethatinsyncromode, *all*functionscalledoutsideofthempipoolmustbepuretoensuresyncronization.
"
exit0
;;
-n|--np)
num_procs="$2"shiftshift
;;
-s|--syncro)
exportQUIMB_SYNCRO_MPI=YESshift
;;
-l|--launcher)
mpi_launcher="$2"shiftshift
;;
"-")
shiftbreak
;;
*) # unknown optionPOSITIONAL+=("$1") # save it in an array for latershift# past argument
;;
esacdoneset-- "${POSITIONAL[@]}" # restore positional parametersmpi_launcher=${mpi_launcher:-"mpiexec"}
# set up environmentexportOMP_NUM_THREADS=1export_QUIMB_MPI_LAUNCHED="MANUAL"
if [ $QUIMB_SYNCRO_MPI ]; then# use simplistic syncronized poolif [ $num_procs ]; thenecho "LaunchingquimbinSyncromodewith ${mpi_launcher} and ${num_procs} processes."
srun--mpi=pmix_v3python "$@"
#srun -n ${SLURM_NTASKS} python "$@"elseecho "LaunchingquimbinSyncromodewith ${mpi_launcher}".
${mpi_launcher} python "$@"
fielse# run script with mpi through mpi4py moduleif [ $num_procs ]; thenecho "Launchingquimbinmpi4py.futuresmodewith ${mpi_launcher} and ${num_procs} processes."
${mpi_launcher} --np "${num_procs}" python-mmpi4py.futures "$@"
elseecho "Launchingquimbinmpi4py.futuresmodewith ${mpi_launcher}."
${mpi_launcher} python-mmpi4py.futures "$@"
fifi
Additional context
No response
The text was updated successfully, but these errors were encountered:
Hi @Babalion, sorry to be slow getting to this. I do now think it would make sense for quimb to basically just have two modes:
threaded
'synchro' style / i.e. usual MPI.
Getting rid of the other modes would hopefully negate the need for any quimb-mpi-python launcher at all. Then it would just be a matter of documenting how to fit it into usual MPI scripts/workflows, and taking some care with environment variables that control threading level.
What was not working / did you change in the launcher above?
Yes indeed, the documentation for launching a script in MPI mode could definitely be improved.
Your idea sounds good and furthermore may also solve #52.
I still had no success to run the script on the SLURM cluster in MPI mode.
Maybe we could reduce the quimb-mpi-python to a minimal working script only supporting the --syncro mode?
Then testing and debugging would be way easier for me.
Moreover, is it in general possible to utilize MPI for the TEBD algorithm?
Is your feature request related to a problem?
The idea was already mentioned in #52 but there a complete redesign of the MPI launcher was discussed.
My feature request is shorter and more goal-oriented.
I would suggest adding a bash script that simply launches an MPI job in syncro mode in a multi-CPU environment.
Maybe the code in
quimb-mpi-slurm
has to be changed slightly.Describe the solution you'd like
I've already tested a lot but still had no success.
What actually works is running a job in non-MPI mode on one CPU with 48 threads.
The SLURM batch script looks then like this:
But instead I want to execute the job not only on one CPU, but on different CPUs (nodes).
Describe alternatives you've considered
My script looks as follows atm:
Also, the commented-out lines at the end were not successful, where I already tried to change the start code to
quimb-mpi-slurm
.Code of
quimb-mpi-slurm
:Additional context
No response
The text was updated successfully, but these errors were encountered: