-
Notifications
You must be signed in to change notification settings - Fork 191
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Document on how to compile and run on Crusher (OLCF). This is the new MI250X machine at Oak Ridge.
- Loading branch information
Showing
4 changed files
with
175 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ HPC Systems | |
hpc/perlmutter | ||
hpc/summit | ||
hpc/spock | ||
hpc/crusher | ||
hpc/juwels | ||
hpc/lassen | ||
hpc/quartz | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
.. _building-crusher: | ||
|
||
Crusher (OLCF) | ||
============== | ||
|
||
The `Crusher cluster <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html>`_ is located at OLCF. | ||
Each node contains 4 AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node. | ||
You can think of the 8 GCDs as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E). | ||
|
||
If you are new to this system, please see the following resources: | ||
|
||
* `Crusher user guide <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html>`_ | ||
* Batch system: `Slurm <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#running-jobs>`_ | ||
* `Production directories <https://docs.olcf.ornl.gov/data/storage_overview.html>`_: | ||
|
||
* ``$PROJWORK/$proj/``: shared with all members of a project (recommended) | ||
* ``$MEMBERWORK/$proj/``: single user (usually smaller quota) | ||
* ``$WORLDWORK/$proj/``: shared with all users | ||
* Note that the ``$HOME`` directory is mounted as read-only on compute nodes. | ||
That means you cannot run in your ``$HOME``. | ||
|
||
|
||
Installation | ||
------------ | ||
|
||
Use the following commands to download the WarpX source code and switch to the correct branch: | ||
|
||
.. code-block:: bash | ||
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx | ||
We use the following modules and environments on the system (``$HOME/crusher_warpx.profile``). | ||
|
||
.. literalinclude:: ../../../../Tools/machines/crusher_warpx.profile.example | ||
:language: bash | ||
|
||
|
||
We recommend to store the above lines in a file, such as ``$HOME/crusher_warpx.profile``, and load it into your shell after a login: | ||
|
||
.. code-block:: bash | ||
source $HOME/crusher_warpx.profile | ||
Then, ``cd`` into the directory ``$HOME/src/warpx`` and use the following commands to compile: | ||
|
||
.. code-block:: bash | ||
cd $HOME/src/warpx | ||
rm -rf build | ||
cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=HIP | ||
cmake --build build -j 10 | ||
The general :ref:`cmake compile-time options <building-cmake>` apply as usual. | ||
|
||
|
||
.. _running-cpp-crusher: | ||
|
||
Running | ||
------- | ||
|
||
.. _running-cpp-crusher-MI100-GPUs: | ||
|
||
MI250X GPUs (2x64 GB) | ||
^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
After requesting an interactive node with the ``getNode`` alias above, run a simulation like this, here using 8 MPI ranks and a single node: | ||
|
||
.. code-block:: bash | ||
runNode ./warpx inputs | ||
Or in non-interactive runs: | ||
|
||
.. literalinclude:: ../../../../Tools/machines/crusher-olcf/submit.sh | ||
:language: bash | ||
|
||
|
||
.. _post-processing-crusher: | ||
|
||
Post-Processing | ||
--------------- | ||
|
||
For post-processing, most users use Python via OLCFs's `Jupyter service <https://jupyter.olcf.ornl.gov>`__ (`Docs <https://docs.olcf.ornl.gov/services_and_applications/jupyter/index.html>`__). | ||
|
||
Please follow the same guidance as for :ref:`OLCF Summit post-processing <post-processing-summit>`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# please set your project account | ||
#export proj=<yourProject> | ||
|
||
# required dependencies | ||
#module switch PrgEnv-cray PrgEnv-amd | ||
module load cmake/3.21.3 | ||
module load craype-accel-amd-gfx90a | ||
module load rocm/4.5.2 | ||
|
||
# optional: faster builds | ||
module load ccache | ||
module load ninja | ||
|
||
# optional: just an additional text editor | ||
module load nano | ||
|
||
# optional: for PSATD in RZ geometry support (not yet available) | ||
#module load blaspp | ||
#module load lapackpp | ||
|
||
# optional: for QED lookup table generation support | ||
module load boost/1.77.0-cxx17 | ||
|
||
# optional: for openPMD support | ||
module load adios2/2.7.1 | ||
module load hdf5/1.10.7 | ||
|
||
# fix system defaults: do not escape $ with a \ on tab completion | ||
shopt -s direxpand | ||
|
||
# an alias to request an interactive batch node for one hour | ||
# for paralle execution, start on the batch node: srun <command> | ||
alias getNode="salloc -A $proj -J warpx -t 01:00:00 -p batch -N 1 -c 8 --ntasks-per-node=8" | ||
# an alias to run a command on a batch node for up to 30min | ||
# usage: runNode <command> | ||
alias runNode="srun -A $proj -J warpx -t 00:30:00 -p batch -N 1 -c 8 --ntasks-per-node=8" | ||
|
||
# GPU-aware MPI | ||
export MPICH_GPU_SUPPORT_ENABLED=1 | ||
export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" | ||
export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" | ||
|
||
# optimize CUDA compilation for MI250X | ||
export AMREX_AMD_ARCH=gfx90a | ||
|
||
# compiler environment hints (PrgEnv-cray) | ||
export CC=$(which cc) | ||
export CXX=$(which CC) | ||
export FC=$(which ftn) | ||
export CFLAGS="-I${ROCM_PATH}/include" | ||
export CXXFLAGS="-I${ROCM_PATH}/include" | ||
export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64" | ||
|
||
# compiler environment hints (PrgEnv-amd) | ||
#export CC=$(which amdclang) | ||
#export CXX=$(which amdclang++) | ||
#export FC=$(which amdflang) | ||
#export CFLAGS="-I${MPICH_DIR}/include" | ||
#export CXXFLAGS="-I${MPICH_DIR}/include" | ||
#export LDFLAGS="-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa" | ||
# note: error in hip::device from hip-config.cmake (OLCFHELP-5370) | ||
# Imported target "hip::device" includes non-existent path | ||
# "HIP_CLANG_INCLUDE_PATH-NOTFOUND/.." | ||
# in its INTERFACE_INCLUDE_DIRECTORIES. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/usr/bin/env bash | ||
|
||
#SBATCH -A <project id> | ||
#SBATCH -J warpx | ||
#SBATCH -o %x-%j.out | ||
#SBATCH -t 00:10:00 | ||
#SBATCH -p batch | ||
#SBATCH --ntasks-per-node=8 | ||
#SBATCH --cpus-per-task=8 | ||
#SBATCH --gpus-per-task=1 | ||
#SBATCH --gpu-bind=closest | ||
#SBATCH -N 1 | ||
|
||
# From the documentation: | ||
# Each Crusher compute node consists of [1x] 64-core AMD EPYC 7A53 | ||
# "Optimized 3rd Gen EPYC" CPU (with 2 hardware threads per physical core) with | ||
# access to 512 GB of DDR4 memory. | ||
# Each node also contains [4x] AMD MI250X, each with 2 Graphics Compute Dies | ||
# (GCDs) for a total of 8 GCDs per node. The programmer can think of the 8 GCDs | ||
# as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E). | ||
|
||
export OMP_NUM_THREADS=8 | ||
srun ./warpx inputs > output.txt |