Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Dependencies for Regression Testing #1667

Merged
merged 90 commits into from
May 3, 2023
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
2a2554f
Update ufs_orion.intel.lua
zach1221 Jan 31, 2023
41a55e8
Merge branch 'ufs-community:develop' into develop
zach1221 Feb 7, 2023
ec3c64a
Merge branch 'ufs-community:develop' into develop
zach1221 Feb 17, 2023
d4be2ce
Update opnReqTest
zach1221 Feb 17, 2023
1bd9ffe
Merge branch 'ufs-community:develop' into develop
zach1221 Mar 21, 2023
7e79ee2
Update rt.sh
zach1221 Mar 21, 2023
90c7f26
Update rt.sh
zach1221 Mar 21, 2023
6b55d8e
Update rt.sh
zach1221 Mar 21, 2023
68da9ce
Update default_vars.sh
zach1221 Mar 21, 2023
acdebf3
Update detect_machine.sh
zach1221 Mar 21, 2023
3f6cf63
Update compile.sh
zach1221 Mar 21, 2023
bb62958
Update module-setup.sh
zach1221 Mar 21, 2023
da0ac94
Create compile_slurm.IN_AWS_cloud
zach1221 Mar 21, 2023
de6bdd0
Create compile_slurm.IN_Azure_cloud
zach1221 Mar 21, 2023
021a630
Create compile_slurm.IN_GCP_cloud
zach1221 Mar 21, 2023
4994d4a
Update rt.sh
zach1221 Mar 21, 2023
ad67bda
Create fv3_slurm.IN_AWS_cloud
zach1221 Mar 21, 2023
6d0cd58
Create fv3_slurm.IN_Azure_cloud
zach1221 Mar 21, 2023
6b60d94
Create fv3_slurm.IN_GCP_cloud
zach1221 Mar 21, 2023
f9ddcd0
Update fv3_slurm.IN_Azure_cloud
zach1221 Mar 21, 2023
c26ebf4
Create ufs_AWS_cloud.intel.lua
zach1221 Mar 21, 2023
f4665f5
Update ufs_AWS_cloud.intel.lua
zach1221 Mar 21, 2023
2c9d8f2
Create ufs_AWS_cloud.intel_debug.lua
zach1221 Mar 21, 2023
7ebd17a
Create ufs_Azure_cloud.intel.lua
zach1221 Mar 21, 2023
4525077
Create ufs_Azure_cloud.intel_debug.lua
zach1221 Mar 21, 2023
a358c0f
Create ufs_GCP_cloud.intel_debug.lua
zach1221 Mar 21, 2023
6d5eb65
Create ufs_GCP_cloud.intel.lua
zach1221 Mar 21, 2023
b60ae31
Update ufs_GCP_cloud.intel_debug.lua
zach1221 Mar 21, 2023
7b40570
Update rt.sh
zach1221 Mar 22, 2023
4c68e1a
Update compile.sh
zach1221 Mar 22, 2023
e11826d
Update module-setup.sh
zach1221 Mar 22, 2023
89df62c
Delete compile_slurm.IN_GCP_cloud
zach1221 Mar 22, 2023
3820c82
Delete compile_slurm.IN_Azure_cloud
zach1221 Mar 22, 2023
ad5901e
Update and rename compile_slurm.IN_AWS_cloud to compile_slurm.IN_noaa…
zach1221 Mar 22, 2023
c815bfa
Delete fv3_slurm.IN_GCP_cloud
zach1221 Mar 22, 2023
b3e24af
Delete fv3_slurm.IN_Azure_cloud
zach1221 Mar 22, 2023
031e96d
Rename fv3_slurm.IN_AWS_cloud to fv3_slurm.IN_noaacloud
zach1221 Mar 22, 2023
5102a26
Delete ufs_GCP_cloud.intel.lua
zach1221 Mar 22, 2023
fbde2bd
Delete ufs_GCP_cloud.intel_debug.lua
zach1221 Mar 22, 2023
504bd9c
Delete ufs_Azure_cloud.intel_debug.lua
zach1221 Mar 22, 2023
ab5137c
Delete ufs_Azure_cloud.intel.lua
zach1221 Mar 22, 2023
4f4fc2e
Update and rename ufs_AWS_cloud.intel_debug.lua to ufs_noaacloud.inte…
zach1221 Mar 22, 2023
2e0d6af
Update and rename ufs_AWS_cloud.intel.lua to ufs_noaacloud.intel.lua
zach1221 Mar 22, 2023
b35ad6b
Update fv3_slurm.IN_noaacloud
zach1221 Mar 24, 2023
d8decd1
Merge branch 'ufs-community:develop' into Cloud_RT
zach1221 Mar 27, 2023
42f03d4
Update ufs_noaacloud.intel.lua
zach1221 Mar 27, 2023
85fd336
Update ufs_noaacloud.intel_debug.lua
zach1221 Mar 27, 2023
0e9d298
Update ufs_noaacloud.intel.lua
zach1221 Mar 27, 2023
a3138c5
Update ufs_noaacloud.intel_debug.lua
zach1221 Mar 27, 2023
eda62c3
Create ufs_common_stack.lua
zach1221 Mar 27, 2023
f953a98
Update fv3_slurm.IN_noaacloud
zach1221 Mar 27, 2023
9b70dec
Update rt_utils.sh
zach1221 Mar 27, 2023
4a13145
Update compile.sh
zach1221 Mar 27, 2023
5d64538
Update detect_machine.sh
zach1221 Mar 27, 2023
78264c9
Update detect_machine.sh
zach1221 Mar 28, 2023
a5cebde
Merge branch 'ufs-community:develop' into Cloud_RT
zach1221 Mar 28, 2023
133140b
Create ufs_common_spack.lua
zach1221 Apr 3, 2023
6db1408
Update ufs_noaacloud.intel_debug.lua
zach1221 Apr 3, 2023
2ffe28e
Update ufs_noaacloud.intel.lua
zach1221 Apr 3, 2023
de8290a
Delete ufs_common_stack.lua
zach1221 Apr 3, 2023
17c3c0d
Create ufs_common_spack_debug.lua
zach1221 Apr 3, 2023
828887a
Update default_vars.sh
zach1221 Apr 3, 2023
55f5b0e
Update detect_machine.sh
zach1221 Apr 3, 2023
bd0fa68
Merge branch 'ufs-community:develop' into Cloud_RT
zach1221 Apr 3, 2023
c96f116
Update rt_utils.sh
zach1221 Apr 3, 2023
cdfb86c
Update fv3_slurm.IN_noaacloud
zach1221 Apr 5, 2023
7aebb46
Update detect_machine.sh
zach1221 Apr 5, 2023
0f45541
Update default_vars.sh
zach1221 Apr 5, 2023
c621cdb
Merge branch 'ufs-community:develop' into Cloud_RT
zach1221 Apr 10, 2023
ddd883f
Update default_vars.sh
zach1221 Apr 10, 2023
4f5732f
Delete ufs_noaacloud.intel_debug.lua
zach1221 Apr 17, 2023
c1f525e
Update ufs_noaacloud.intel.lua
zach1221 Apr 17, 2023
e5092d6
Delete ufs_common_spack_debug.lua
zach1221 Apr 17, 2023
d90388e
Update fv3_slurm.IN_noaacloud
zach1221 Apr 18, 2023
3b9466b
Update ufs_noaacloud.intel.lua
zach1221 Apr 18, 2023
67344e8
Merge branch 'ufs-community:develop' into Cloud_RT
zach1221 Apr 18, 2023
55347fa
Update fv3_slurm.IN_noaacloud
zach1221 Apr 18, 2023
17db517
Adding Requesting if statement to default_vars
zach1221 Apr 24, 2023
6799f74
Merge branch 'ufs-community:develop' into Cloud_RT
zach1221 May 2, 2023
f25d3e1
add cheyenne.intel RT logs: passed
zach1221 May 2, 2023
18ae760
add cheyenne.gnu RT logs: passed
zach1221 May 2, 2023
63e1388
Update rt_utils.sh
zach1221 May 2, 2023
90d8f40
[AutoRT] hera.intel Job Completed.
jkbk2004 May 3, 2023
a841707
[AutoRT] cheyenne.gnu Job Completed.
epic-cicd-jenkins May 3, 2023
4b174a6
[AutoRT] hera.gnu Job Completed.
jkbk2004 May 3, 2023
86dd370
add orion.intel RT logs: passed
zach1221 May 3, 2023
75bbb90
[AutoRT] cheyenne.intel Job Completed.
epic-cicd-jenkins May 3, 2023
a011de1
WCOSS2 Intel RT Log
BrianCurtis-NOAA May 3, 2023
e8a9dbc
add jet.intel RT logs: passed
FernandoAndrade-NOAA May 3, 2023
77590db
Update opnReqTest
zach1221 May 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions modulefiles/ufs_common_stack.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
help([[
loads UFS Model common libraries
]])

jasper_ver=os.getenv("jasper_ver") or "2.0.32"
load(pathJoin("jasper", jasper_ver))

zlib_ver=os.getenv("zlib_ver") or "1.2.13"
load(pathJoin("zlib", zlib_ver))

libpng_ver=os.getenv("libpng_ver") or "1.6.37"
load(pathJoin("libpng", libpng_ver))

hdf5_ver=os.getenv("hdf5_ver") or "1.14.0"
load(pathJoin("hdf5", hdf5_ver))

netcdf_ver=os.getenv("netcdf_ver") or "4.9.0"
load(pathJoin("netcdf", netcdf_ver))

pio_ver=os.getenv("pio_ver") or "2.5.9"
load(pathJoin("pio", pio_ver))

esmf_ver=os.getenv("esmf_ver") or "8.3.0b09"
load(pathJoin("esmf", esmf_ver))

fms_ver=os.getenv("fms_ver") or "2022.04"
load(pathJoin("fms",fms_ver))

bacio_ver=os.getenv("bacio_ver") or "2.4.1"
load(pathJoin("bacio", bacio_ver))

crtm_ver=os.getenv("crtm_ver") or "2.4.0"
load(pathJoin("crtm", crtm_ver))

g2_ver=os.getenv("g2_ver") or "3.4.5"
load(pathJoin("g2", g2_ver))

g2tmpl_ver=os.getenv("g2tmpl_ver") or "1.10.2"
load(pathJoin("g2tmpl", g2tmpl_ver))

ip_ver=os.getenv("ip_ver") or "3.3.3"
load(pathJoin("ip", ip_ver))

sp_ver=os.getenv("sp_ver") or "2.3.3"
load(pathJoin("sp", sp_ver))

w3emc_ver=os.getenv("w3emc_ver") or "2.9.2"
load(pathJoin("w3emc", w3emc_ver))

gftl_shared_ver=os.getenv("gftl_shared_ver") or "v1.5.0"
load(pathJoin("gftl-shared", gftl_shared_ver))

mapl_ver=os.getenv("mapl_ver") or "2.22.0-esmf-8.3.0b09"
load(pathJoin("mapl", mapl_ver))

whatis("Description: UFS build environment common libraries")
20 changes: 20 additions & 0 deletions modulefiles/ufs_noaacloud.intel.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
help([[
loads UFS Model prerequisites for cloud/intel
]])

prepend_path("MODULE_PATH", "/contrib/spack-stack/envs/ufs-wm/install/modulefiles/Core")
zach1221 marked this conversation as resolved.
Show resolved Hide resolved

hpc_intel_ver=os.getenv("hpc_intel_ver") or "2021.3.0"
load(pathJoin("stack-intel", hpc_intel_ver))

hpc_impi_ver=os.getenv("hpc_impi_ver") or "2021.3.0"
load(pathJoin("stack-intel-oneapi-mpi", hpc_impi_ver))

load("ufs_common_spack")

setenv("CC", "mpiicc")
setenv("CXX", "mpiicpc")
setenv("FC", "mpiifort")
setenv("CMAKE_Platform", "noaacloud.intel")

whatis("Description: UFS build environment")
20 changes: 20 additions & 0 deletions modulefiles/ufs_noaacloud.intel_debug.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
help([[
loads UFS Model prerequisites for cloud/intel
]])

prepend_path("MODULE_PATH", "/contrib/spack-stack/envs/ufs-wm/install/modulefiles/Core")

hpc_intel_ver=os.getenv("hpc_intel_ver") or "2021.3.0"
load(pathJoin("stack-intel", hpc_intel_ver))

hpc_impi_ver=os.getenv("hpc_impi_ver") or "2021.3.0"
load(pathJoin("stack-intel-oneapi-mpi", hpc_impi_ver))

load("ufs_common_spack")
zach1221 marked this conversation as resolved.
Show resolved Hide resolved

setenv("CC", "mpiicc")
setenv("CXX", "mpiicpc")
setenv("FC", "mpiifort")
setenv("CMAKE_Platform", "noaacloud.intel")

whatis("Description: UFS build environment")
57 changes: 57 additions & 0 deletions tests/default_vars.sh
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,64 @@ elif [[ $MACHINE_ID = expanse.* ]]; then

TPN_cpl_atmw_gdas=12; INPES_cpl_atmw_gdas=6; JNPES_cpl_atmw_gdas=8
THRD_cpl_atmw_gdas=2; WPG_cpl_atmw_gdas=24; APB_cpl_atmw_gdas="0 311"; WPB_cpl_atmw_gdas="312 559"

elif [[ $MACHINE_ID = AWS_cloud.* ]]; then
zach1221 marked this conversation as resolved.
Show resolved Hide resolved

TPN=36

INPES_dflt=3 ; JNPES_dflt=8
INPES_thrd=3 ; JNPES_thrd=4

THRD_cpl_dflt=1
INPES_cpl_dflt=3; JNPES_cpl_dflt=8; WPG_cpl_dflt=6
OCN_tasks_cpl_dflt=20
ICE_tasks_cpl_dflt=10
WAV_tasks_cpl_dflt=20

THRD_cpl_thrd=2
INPES_cpl_thrd=3; JNPES_cpl_thrd=4; WPG_cpl_thrd=6
OCN_tasks_cpl_thrd=20
ICE_tasks_cpl_thrd=10
WAV_tasks_cpl_thrd=12

elif [[ $MACHINE_ID = Azure_cloud.* ]]; then

TPN=44

INPES_dflt=3 ; JNPES_dflt=8
INPES_thrd=3 ; JNPES_thrd=4

THRD_cpl_dflt=1
INPES_cpl_dflt=3; JNPES_cpl_dflt=8; WPG_cpl_dflt=6
OCN_tasks_cpl_dflt=20
ICE_tasks_cpl_dflt=10
WAV_tasks_cpl_dflt=20

THRD_cpl_thrd=2
INPES_cpl_thrd=3; JNPES_cpl_thrd=4; WPG_cpl_thrd=6
OCN_tasks_cpl_thrd=20
ICE_tasks_cpl_thrd=10
WAV_tasks_cpl_thrd=12

elif [[ $MACHINE_ID = GCP_cloud.* ]]; then

TPN=30

INPES_dflt=3 ; JNPES_dflt=8
INPES_thrd=3 ; JNPES_thrd=4

THRD_cpl_dflt=1
INPES_cpl_dflt=3; JNPES_cpl_dflt=8; WPG_cpl_dflt=6
OCN_tasks_cpl_dflt=20
ICE_tasks_cpl_dflt=10
WAV_tasks_cpl_dflt=20

THRD_cpl_thrd=2
INPES_cpl_thrd=3; JNPES_cpl_thrd=4; WPG_cpl_thrd=6
OCN_tasks_cpl_thrd=20
ICE_tasks_cpl_thrd=10
WAV_tasks_cpl_thrd=12

else

echo "Unknown MACHINE_ID ${MACHINE_ID}"
Expand Down
13 changes: 12 additions & 1 deletion tests/detect_machine.sh
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,21 @@ case $(hostname -f) in
login2.stampede2.tacc.utexas.edu) MACHINE_ID=stampede ;; ### stampede2
login3.stampede2.tacc.utexas.edu) MACHINE_ID=stampede ;; ### stampede3
login4.stampede2.tacc.utexas.edu) MACHINE_ID=stampede ;; ### stampede4



login01.expanse.sdsc.edu) MACHINE_ID=expanse ;; ### expanse1
login02.expanse.sdsc.edu) MACHINE_ID=expanse ;; ### expanse2

esac

case $(echo $PW_CSP) in
zach1221 marked this conversation as resolved.
Show resolved Hide resolved

aws) MACHINE_ID=aws ;; ### parallelworks aws
google) MACHINE_ID=gcp ;; ### parallelworks gcp
azure) MACHINE_ID=azure ;; ### parallelworks azure

esac
[[ ${MACHINE_ID} =~ "aws" || ${MACHINE_ID} =~ "gcp" || ${MACHINE_ID} =~ "azure" ]] && MACHINE_ID=noaacloud

# Overwrite auto-detect with RT_MACHINE if set
MACHINE_ID=${RT_MACHINE:-${MACHINE_ID}}
Expand Down
17 changes: 17 additions & 0 deletions tests/fv3_conf/compile_slurm.IN_noaacloud
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/sh
#SBATCH -e err
#SBATCH -o out
#SBATCH --qos=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=@[TPN]
#SBATCH --job-name="@[JBNME]"

set -eux

echo -n " $( date +%s )," > job_timestamp.txt
echo "Compile started: " `date`

@[PATHRT]/compile.sh @[MACHINE_ID] "@[MAKE_OPT]" @[COMPILE_NR]

echo "Compile ended: " `date`
echo -n " $( date +%s )," >> job_timestamp.txt
46 changes: 46 additions & 0 deletions tests/fv3_conf/fv3_slurm.IN_noaacloud
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
!/bin/sh
#SBATCH -e err
#SBATCH -o out
#SBATCH --qos=batch
### #SBATCH --ntasks=@[TASKS]
#SBATCH --nodes=@[NODES]
#SBATCH --ntasks-per-node=@[TPN]
#SBATCH --job-name="@[JBNME]"
#SBATCH --exclusive

set -eux
echo -n " $( date +%s )," > job_timestamp.txt

set +x
MACHINE_ID=noaacloud
#module use $( pwd -P )
#module load modules.fv3
#module list
module use /lustre/ufs-weather-model/modulefiles
module load ufs_cloud.intel
module list

set -x

ulimit -s unlimited
ulimit -l unlimited

echo "Model started: " `date`

#export MPI_TYPE_DEPTH=20
export OMP_STACKSIZE=512M
export KMP_AFFINITY=scatter
export OMP_NUM_THREADS=1
#export ESMF_RUNTIME_COMPLIANCECHECK=OFF:depth=4
#export PSM_RANKS_PER_CONTEXT=4
#export PSM_SHAREDCONTEXTS=1
#export ESMF_RUNTIME_PROFILE=ON
#export ESMF_RUNTIME_PROFILE_OUTPUT="SUMMARY"

# Avoid job errors because of filesystem synchronization delays
sync && sleep 1

srun --mpi=pmi2 --label -n @[TASKS] ./fv3.exe

echo "Model ended: " `date`
echo -n " $( date +%s )," >> job_timestamp.txt
9 changes: 8 additions & 1 deletion tests/module-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,14 @@ elif [[ $MACHINE_ID = cheyenne* ]] ; then
source /glade/u/apps/ch/modulefiles/default/localinit/localinit.sh
fi
module purge


elif [[ $MACHINE_ID = noaacloud* ]] ; then
# We are on NOAA Cloud
if ( ! eval module help > /dev/null 2>&1 ) ; then
source /apps/lmod/8.5.2/init/bash
fi
module purge

elif [[ $MACHINE_ID = stampede* ]] ; then
# We are on TACC Stampede
if ( ! eval module help > /dev/null 2>&1 ) ; then
Expand Down
2 changes: 1 addition & 1 deletion tests/opnReqTest
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ build_opnReqTests() {
ecflow_create_compile_task
else
echo "compiling $name with compile option $MAKE_OPT"
./compile.sh $MACHINE_ID "${MAKE_OPT}" $name >${LOG_DIR}/compile_${TEST_NAME}_$name.log 2>&1
./compile.sh $MACHINE_ID "${MAKE_OPT}" $name >${LOG_DIR}/compile_${TEST_NAME}_$name.log #2>&1
jkbk2004 marked this conversation as resolved.
Show resolved Hide resolved
echo "done compiling $name"
fi
done
Expand Down
26 changes: 26 additions & 0 deletions tests/rt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,29 @@ elif [[ $MACHINE_ID = expanse.* ]]; then
PTMP=$dprefix
SCHEDULER=slurm
cp fv3_conf/fv3_slurm.IN_expanse fv3_conf/fv3_slurm.IN

elif [[ $MACHINE_ID = noaacloud.* ]]; then

module use /apps/modules/modulefiles
module load rocoto/1.3.3

ROCOTORUN=$(which rocotorun)
ROCOTOSTAT=$(which rocotostat)
ROCOTOCOMPLETE=$(which rocotocomplete)
ROCOTO_SCHEDULER=slurm

QUEUE=batch
COMPILE_QUEUE=batch
PARTITION=
dprefix=/lustre/
DISKNM=/contrib/ufs-weather-model/RT
STMP=$dprefix/stmp4
PTMP=$dprefix/stmp2
SCHEDULER=slurm
cp fv3_conf/fv3_slurm.IN_noaacloud fv3_conf/fv3_slurm.IN
cp fv3_conf/compile_slurm.IN_noaacloud fv3_conf/compile_slurm.IN


else
die "Unknown machine ID, please edit detect_machine.sh file"
fi
Expand Down Expand Up @@ -510,6 +532,10 @@ if [[ $ROCOTO == true ]]; then
QUEUE=s4
COMPILE_QUEUE=s4
ROCOTO_SCHEDULER=slurm
elif [[ $MACHINE_ID = noaacloud.* ]]; then
QUEUE=batch
COMPILE_QUEUE=batch
ROCOTO_SCHEDULER=slurm
elif [[ $MACHINE_ID = jet.* ]]; then
QUEUE=batch
COMPILE_QUEUE=batch
Expand Down
2 changes: 1 addition & 1 deletion tests/rt_utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ check_results() {
fi

if [[ $d -eq 1 && ${i##*.} == 'nc' ]] ; then
if [[ ${MACHINE_ID} =~ orion || ${MACHINE_ID} =~ hera || ${MACHINE_ID} =~ wcoss2 || ${MACHINE_ID} =~ acorn || ${MACHINE_ID} =~ cheyenne || ${MACHINE_ID} =~ gaea || ${MACHINE_ID} =~ jet || ${MACHINE_ID} =~ s4 ]] ; then
if [[ ${MACHINE_ID} =~ orion || ${MACHINE_ID} =~ hera || ${MACHINE_ID} =~ wcoss2 || ${MACHINE_ID} =~ acorn || ${MACHINE_ID} =~ cheyenne || ${MACHINE_ID} =~ gaea || ${MACHINE_ID} =~ jet || ${MACHINE_ID} =~ s4 || ${MACHINE_ID} =~ noaacloud]] ; then
printf ".......ALT CHECK.." >> ${REGRESSIONTEST_LOG}
printf ".......ALT CHECK.."
${PATHRT}/compare_ncfile.py ${RTPWD}/${CNTL_DIR}/$i ${RUNDIR}/$i > compare_ncfile.log 2>&1 && d=$? || d=$?
Expand Down