Skip to content

Commit

Permalink
Docs: Add Crusher-OLCF
Browse files Browse the repository at this point in the history
Document on how to compile and run on Crusher (OLCF).
This is the new MI250X machine at Oak Ridge.
  • Loading branch information
ax3l committed Jan 13, 2022
1 parent a20c97d commit 2029558
Show file tree
Hide file tree
Showing 4 changed files with 175 additions and 0 deletions.
1 change: 1 addition & 0 deletions Docs/source/install/hpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ HPC Systems
hpc/perlmutter
hpc/summit
hpc/spock
hpc/crusher
hpc/juwels
hpc/lassen
hpc/quartz
Expand Down
87 changes: 87 additions & 0 deletions Docs/source/install/hpc/crusher.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. _building-crusher:

Crusher (OLCF)
==============

The `Crusher cluster <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html>`_ is located at OLCF.
Each node contains 4 AMD MI250X GPUs, each with 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node.
You can think of the 8 GCDs as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E).

If you are new to this system, please see the following resources:

* `Crusher user guide <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html>`_
* Batch system: `Slurm <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#running-jobs>`_
* `Production directories <https://docs.olcf.ornl.gov/data/storage_overview.html>`_:

* ``$PROJWORK/$proj/``: shared with all members of a project (recommended)
* ``$MEMBERWORK/$proj/``: single user (usually smaller quota)
* ``$WORLDWORK/$proj/``: shared with all users
* Note that the ``$HOME`` directory is mounted as read-only on compute nodes.
That means you cannot run in your ``$HOME``.


Installation
------------

Use the following commands to download the WarpX source code and switch to the correct branch:

.. code-block:: bash
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx
We use the following modules and environments on the system (``$HOME/crusher_warpx.profile``).

.. literalinclude:: ../../../../Tools/machines/crusher_warpx.profile.example
:language: bash


We recommend to store the above lines in a file, such as ``$HOME/crusher_warpx.profile``, and load it into your shell after a login:

.. code-block:: bash
source $HOME/crusher_warpx.profile
Then, ``cd`` into the directory ``$HOME/src/warpx`` and use the following commands to compile:

.. code-block:: bash
cd $HOME/src/warpx
rm -rf build
cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=HIP
cmake --build build -j 10
The general :ref:`cmake compile-time options <building-cmake>` apply as usual.


.. _running-cpp-crusher:

Running
-------

.. _running-cpp-crusher-MI100-GPUs:

MI250X GPUs (2x64 GB)
^^^^^^^^^^^^^^^^^^^^^

After requesting an interactive node with the ``getNode`` alias above, run a simulation like this, here using 8 MPI ranks and a single node:

.. code-block:: bash
runNode ./warpx inputs
Or in non-interactive runs:

.. literalinclude:: ../../../../Tools/machines/crusher-olcf/submit.sh
:language: bash


.. _post-processing-crusher:

Post-Processing
---------------

For post-processing, most users use Python via OLCFs's `Jupyter service <https://jupyter.olcf.ornl.gov>`__ (`Docs <https://docs.olcf.ornl.gov/services_and_applications/jupyter/index.html>`__).

Please follow the same guidance as for :ref:`OLCF Summit post-processing <post-processing-summit>`.
64 changes: 64 additions & 0 deletions Tools/machines/crusher-olcf/crusher_warpx.profile.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# please set your project account
#export proj=<yourProject>

# required dependencies
#module switch PrgEnv-cray PrgEnv-amd
module load cmake/3.21.3
module load craype-accel-amd-gfx90a
module load rocm/4.5.2

# optional: faster builds
module load ccache
module load ninja

# optional: just an additional text editor
module load nano

# optional: for PSATD in RZ geometry support (not yet available)
#module load blaspp
#module load lapackpp

# optional: for QED lookup table generation support
module load boost/1.77.0-cxx17

# optional: for openPMD support
module load adios2/2.7.1
module load hdf5/1.10.7

# fix system defaults: do not escape $ with a \ on tab completion
shopt -s direxpand

# an alias to request an interactive batch node for one hour
# for paralle execution, start on the batch node: srun <command>
alias getNode="salloc -A $proj -J warpx -t 01:00:00 -p batch -N 1 -c 8 --ntasks-per-node=8"
# an alias to run a command on a batch node for up to 30min
# usage: runNode <command>
alias runNode="srun -A $proj -J warpx -t 00:30:00 -p batch -N 1 -c 8 --ntasks-per-node=8"

# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=1
export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib"
export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa"

# optimize CUDA compilation for MI250X
export AMREX_AMD_ARCH=gfx90a

# compiler environment hints (PrgEnv-cray)
export CC=$(which cc)
export CXX=$(which CC)
export FC=$(which ftn)
export CFLAGS="-I${ROCM_PATH}/include"
export CXXFLAGS="-I${ROCM_PATH}/include"
export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64"

# compiler environment hints (PrgEnv-amd)
#export CC=$(which amdclang)
#export CXX=$(which amdclang++)
#export FC=$(which amdflang)
#export CFLAGS="-I${MPICH_DIR}/include"
#export CXXFLAGS="-I${MPICH_DIR}/include"
#export LDFLAGS="-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa"
# note: error in hip::device from hip-config.cmake (OLCFHELP-5370)
# Imported target "hip::device" includes non-existent path
# "HIP_CLANG_INCLUDE_PATH-NOTFOUND/.."
# in its INTERFACE_INCLUDE_DIRECTORIES.
23 changes: 23 additions & 0 deletions Tools/machines/crusher-olcf/submit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash

#SBATCH -A <project id>
#SBATCH -J warpx
#SBATCH -o %x-%j.out
#SBATCH -t 00:10:00
#SBATCH -p batch
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=8
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=closest
#SBATCH -N 1

# From the documentation:
# Each Crusher compute node consists of [1x] 64-core AMD EPYC 7A53
# "Optimized 3rd Gen EPYC" CPU (with 2 hardware threads per physical core) with
# access to 512 GB of DDR4 memory.
# Each node also contains [4x] AMD MI250X, each with 2 Graphics Compute Dies
# (GCDs) for a total of 8 GCDs per node. The programmer can think of the 8 GCDs
# as 8 separate GPUs, each having 64 GB of high-bandwidth memory (HBM2E).

export OMP_NUM_THREADS=8
srun ./warpx inputs > output.txt

0 comments on commit 2029558

Please sign in to comment.