CESM is used directly from its source code. Researches will clone a release or development version of CESM, grab additional sources from other external repositories (e.g. CIME) and use that ensemble to create, build and run simulations (aka cases). Since cases are created from the source code of CESM, there is little interest in distributing CESM for our users. This repo contains all files and tools developed by VUB-HPC to make the source code of CESM work in our clusters and ease the workflow of running case for our users.
Researchers will usually download one of the stable release of
CESM and then download CIME and any
external packages with the script checkout_externals
. The resulting collection
of packages is predictable and the versions of all external packages is defined
in the file Externals.cfg
from CESM.
User documentation can be found in the Specific Use Cases Documentation of VUB-HPC.
Machine files are XML files configuring the system environment for CIME. All
files are located in cime/config/cesm/machines/
.
- config_machines.xml
- regex to identify the system machine
- file structure: location of input data, case folders and case output
- parallelization settings
- configuration of the module system
- list of modules to be loaded by CESM
- config_compiler.xml
- compiler settings
- build environment
- filesystem settings
- config_batch.xml:
- description of the queue
- job request of resources
- config_pio.xml
- fine grained settings to access the filesystem
- config_workflow.xml
- defines steps in the default and custom workflows
- job default settings of some steps
We provide XML files with the configuration settings for Hydra (VSC Tier-2 HPC),
Breniac (VSC Tier-1 HPC) and Hortense (VSC Tier-1 HPC). These files are located
in cesm-config/machines and cover the configuration files for
machines, compilers and the batch system. The settings for supported VSC
clusters can be easily added to the default machine files in CIME with the tool
update-cesm-machines
. For instance, the three aforementioned config files can
be updated with the additional settings from cesm-config with the
command
update-cesm-machines /path/to/cesm-x.y.z/cime/config/cesm/machines/ /path/to/cesm-config/machines/
All three machine files (machines, compilers and batch) have to be updated to get a working installation of CESM in the VSC clusters.
-
The only module needed is
CESM-deps
. It has to be loaded at all times, from cloning of the sources to case submission. -
Using
--machine hydra
is optional as long the user is in a compute node or a login node in Hydra. CESM usesNODENAME_REGEX
inconfig_machines.xml
to identify the host machine. -
Users have two options on creation of new cases for
--compiler
. One isgnu
, based on the GNU Compiler Collection. The other isintel
, based on Intel compilers. The versions of each compiler are described in the easyconfigs of each CESM-deps module. -
There is a single configuration for both compilers that is tailored to nodes with Skylake CPUs, including the login nodes.
-
CESM is not capable of detecting and automatically adding the required libraries to its build process. The current specification of
SLIBS
contains just what we found to be required (so far). -
By design, CESM sets a specific queue with
-q queue_name
, otherwise it fails to even create the case. In Hydra we use the partitionskylake_mpi
as the default queue. -
Limit maximum number of nodes to 12 to ensure that the scale of CESM jobs stay within reasonable limits for Hydra.
-
There are two clusters defined in Breniac,
breniac
andbreniac-skl
-
breniac
is the default one, uses AVX2 and the resulting binaries will works in all nodes in Breniac. It will be picked by default in the login nodes or if--machine breniac
is specified -
breniac-skl
is optimized for the Skylake nodes in Breniac and uses AVX512. It will only be picked if the host system is a Skylake node or if--machine breniac-skl
is specified
-
-
The only module needed is
CESM-deps
. It has to be loaded at all times, from cloning of the sources to case submission. -
By design, CESM sets a specific queue with
-q queue_name
, otherwise it fails to even create the case. In Breniac we can use the queueqdef
as it will derive the job toq1h
,q24h
orq72h
depending on the walltime requested.
-
The only module needed to create, setup and build cases is
CESM-deps
. It has to be loaded at all times, from cloning of the sources to case submission. CESM will also load vsc-mympirun at runtime to be able to use MPI. -
Using
--machine hortense
is optional as long the user is in a non-GPU compute node or a login node in Hortense. CESM usesNODENAME_REGEX
inconfig_machines.xml
to identify the host machine. -
Cases are submitted with Slurm's sbatch as CESM is not compatible with jobcli.
-
By default all cases are run in the
cpu_rome
partition. Optionally, cases can also be submitted tocpu_rome_512
with the high memory nodes. -
Input data will be downloaded from the SVN repository of UCAR. The FTP protocol is blocked in UGent.
-
The recommended workflow is to create the case as usual, setup and build the case in the compute nodes with the job script case.setupbuild.slurm and then submit the case as usual with
case.submit
.
All paths defined in config_machines.xml
are below $VSC_SCRATCH
. These
folders contain everything including the executables, input data sets, user
cases and outputs. CESM can generate rather big cases reaching several hundreds
of GB. The structure is contained in the cesm
folder
$VSC_SCRATCH
└── cesm
├── cesm_baselines (BASELINE_ROOT)
├── inputdata (DIN_LOC_ROOT)
│ └── atm/datm7 (DIN_LOC_ROOT_CLMFORC)
├── output (CIME_OUTPUT_ROOT)
│ └── archive/$CASE (DOUT_S_ROOT)
├── tools/cprnc/cprnc (CCSM_CPRNC)
├── cases (optional folder with user cases)
└── sources (optional folder with sources of CESM)
Input data in cesm/inputdata
(DIN_LOC_ROOT) will be populated with existing
data sets from remote servers, such as historical weather records. This data is
actively accessed during the simulation in read operations, but not modified or
created by running cases. Input data files are automatically downloaded by CIME
into DIN_LOC_ROOT
before the case is submitted to the queue. Only missing
files that are needed to run the current simulation will be downloaded.
Therefore, the contents of DIN_LOC_ROOT
will grow over time, easily reaching
several TB of data distributed in several thousands files ranging from a few
hundred MB to several GB. Given the characteristics of the input data, it is
very compelling to have a centralized storage for CESM input data that can be
shared by multiple users.
In Hydra, the collection of input data files is stored by default in the user's
scratch storage, DIN_LOC_ROOT
is defined $VSC_SCRATCH/cesm
. Alternatively,
users in a Virtual Organization (VO) can link any folder in DIN_LOC_ROOT
to
the data stored in their VO. VSC_SCRATCH_VO
storage is as fast as
VSC_SCRATCH
and can be used during the execution of the case without hindering
its performance.
Example to use input datasets from a shared folder in a VO:
$ ln -s $VSC_SCRATCH_VO/inputdata $VSC_SCRATCH/cesm/inputdata
CESM will download missing input data from the external servers listed in
config_inputdata.xml
. By default, the preferred options are FTP servers
provided by ucar.edu. However, the FTP protocol is not
secure and hence, it might be blocked in your HPC cluster. Alternatively,
ucar.edu also provides a SVN repository with all input
data. CESM will automatically fallback to it if the connection to FTP servers
fail.
Versions of CESM 2.0.x and 2.1.x will try to validate the checksums from the SVN
repository externally, using the list of checksums in inputdata_checksum.dat
.
This will fail as that list of checksums is only provided and needed for
downloads from the FTP servers. If you run into this issue, the patch
01-CIME-fix-download-server-fallback
solves this bug.
The clusters in VSC have fast access to the VSC iRODS storage managed by KU Leuven. Since access to this iRODS server is 10x faster than the default external servers with input data from ucar.edu, the goal is to use the iRODS server as a cache to quickly download any input data files already available in it and only fallback to the default servers for the first download of missing files.
Patches in cesm-config/irods enable support for iRODS in CESM/CIME:
-
Patch 01: makes CESM always start from the top server in
config_inputdata.xml
to download each target input file, so each file can be downloaded from the fastest available option -
Patch 02: adds iRODS as an additional download method and gives it precedence over
wget
or FTP -
Patch 03: automatically synchronizes the contents of
DIN_LOC_ROOT
to the iRODS server at the end of the simulation
Instruction to use CESM/CIME with iRODS:
-
Download the source code of CESM/CIME as usual
$ git clone -b release-cesm2.2.0 https://github.com/ESCOMP/cesm.git cesm-2.2.0 $ cd cesm-2.2.0/ $ ./manage_externals/checkout_externals
-
Patch your source code of CESM/CIME to enable support for iRODS. Determine the version of CIME in your tree and choose the closest version of the patch available in cesm-config/irods
$ cd cesm-2.2.0/ $ git -C cime/ describe --tags cime5.8.32 $ git apply /path/to/cesm-config/irods/cime-5.8.32/{01,02,03}-*.patch
-
Remember to authenticate to the irods servers in Leuven to setup, build and run your case
$ ssh login.hpc.kuleuven.be irods-setup | bash
-
(Only once) Create a collection for CESM input data in iRODS
$ imkdir -p cesm/inputdata
-
(Optional) Update the iRODS address in
config_inputdata.xml
if your collection of CESM data is located anywhere else thancesm/inputdata
The common workflow with CESM consists in creating and building the case interactively in the login node of the cluster. Then the case is submitted to the queue and CESM will automatically set the resources, walltime and queue of each job. This can cause problems in a heterogeneous environment as not all nodes might provide the same hardware features as the login nodes.
The example job scripts in cesm-config/scripts solve this problem by executing these steps in the compute nodes of the cluster. In this way, the compilation can be optimized to the host machine, simplifying the configuration, and the user does not have to worry about where the case is build and where it is executed.
-
case.slurm: performs setup, build and execution of the case
-
case.setupbuild.slurm: performs setup and build of the case, then the user can use
case.submit
as usual
Loads all dependencies to build and run CESM cases.
-
-
default option in the two Breniac clusters
-
available in Hydra by setting
--compiler=intel
-
-
-
default option in Hortense
-
available in Hydra by setting
--compiler=gnu
-
Our easyconfigs of CESM-deps are based on those available in
EasyBuild.
However, the CESM-deps module in the VSC clusters also contain the configuration
files and scripts from this repository, which are located in the installation
directory ($EBROOTCESMMINDEPS
). Hence, our users have direct access to these
files once CESM-deps
is loaded. The usage instructions of our CESM-deps
modules also provide a minimum set of instructions to create cases with this
configuration files.
Loads software commonly used to analyse the results of the simulations.
-
- available in Hydra
There is small tool called
cprnc that needs to be
compiled and placed in CCSM_CPRNC
, a path defined in config_machines.xml
.
These are the steps to compile this tool
- Load the CESM-deps/2-intel-2019b module
- Prepare the source code tree of CESM as usual (as explained in our documentation)
- Change to source folder:
cd $VSC_SCRACTH/cime/cesm-x.y.z/cime/tools/cprnc/
- Configure:
CIMEROOT=../.. ../configure --macros-format=Makefile
- Build:
$ CIMEROOT=../.. source ./.env_mach_specific.sh && make FFLAGS="$FFLAGS -I${EBROOTNETCDFMINFORTRAN}/include" LDFLAGS="$LDFLAGS -I${EBROOTNETCDFMINFORTRAN}/lib"
The binary installed in CCSM_CPRNC
will be used in all nodes in the cluster.
Therefore it has to be build with the minimum CPU optimizations
-
the binary for Hydra was built in an IvyBridge CPU (available upon request)
-
the binary for Breniac was built in a Broadwell CPU (available upon request)
Compilation instructions for the CLM tool mksurfdata_map
- Load the CESM-deps/2-intel-2019b module
- Go to the mksurfdata_map source directory:
cd $VSC_SCRACTH/cime/cesm-x.y.z/components/clm/tools/mksurfdata_map/src
- Build
mksurfdata_map
with the following command$ USER_FC=gfortran LIB_NETCDF="$EBROOTNETCDFMINFORTRAN/lib" INC_NETCDF="$EBROOTNETCDFMINFORTRAN/include" USER_FFLAGS="-fno-range-check" make
The folder cesm-config/tests contains instructions to carry out different tests on a CESM installation, as well as results from multiple of our tests in VSC clusters.