Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable and retest RT cases on Derecho #2038

Closed
zach1221 opened this issue Dec 14, 2023 · 12 comments
Closed

Enable and retest RT cases on Derecho #2038

zach1221 opened this issue Dec 14, 2023 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@zach1221
Copy link
Collaborator

zach1221 commented Dec 14, 2023

Description

Cases that were disabled for Derecho in rt.conf, to match Cheyenne setting, and should be retested/debugged to see if they can be enabled to run on the HPC.

Enable and retest regional_atmaq_debug_intel, cpld_control_p8_faster_intel, cpld_bmark_p8, cpld_restart_bmark_p8 and conus13km_debug_qr on Derecho.

To Reproduce:

  1. clone ufs-weather model develop repo
  2. enable these tests in rt.conf
  3. re-run

Additional context

Output

@zach1221 zach1221 added the bug Something isn't working label Dec 14, 2023
@zach1221 zach1221 self-assigned this Dec 14, 2023
@DeniseWorthen
Copy link
Collaborator

Cheyenne-disabled tests also include cpld_bmark_p8, cpld_restart_bmark_p8 and conus13km_debug_qr

@zach1221
Copy link
Collaborator Author

Cheyenne-disabled tests also include cpld_bmark_p8, cpld_restart_bmark_p8 and conus13km_debug_qr

Yes, true. Ok, noted in description as well.

@natalie-perlin
Copy link
Collaborator

Here are the modules that are needed to be loaded on Derecho to enable use of ecflow and rocoto:

module use /glade/work/epicufsrt/contrib/spack-stack/derecho/modulefiles
module load ecflow/5.8.4
module use /glade/work/epicufsrt/contrib/derecho/rocoto/modulefiles
module load rocoto

@zach1221
Copy link
Collaborator Author

@natalie-perlin ok, should variable ECFLOW_START=/glade/p/ral/jntp/tools/miniconda3/4.8.3/envs/ufs-weather-model/bin/ecflow_start.sh be changed?

@BrianCurtis-NOAA
Copy link
Collaborator

BrianCurtis-NOAA commented Feb 20, 2024

Derecho has it's own ecflow install through module load ecflow, if it's easier to use that.

Once you module load ecflow it creates the paths to use their ecflow_start.sh. I am not sure though, how much that complicates ecflow package on python.

@natalie-perlin
Copy link
Collaborator

There is an ecflow_start.sh script already:
ECFLOW_START=/glade/work/epicufsrt/contrib/spack-stack/derecho/ecflow-5.8.4/bin/ecflow_start.sh

@natalie-perlin
Copy link
Collaborator

Derecho has it's own ecflow install through module load ecflow, if it's easier to use that.

Once you module load ecflow it creates the paths to use their ecflow_start.sh. I am not sure though, how much that complicates ecflow package on python.

@BrianCurtis-NOAA - it might be more handy to use the same ecflow version as used during the spack-stack build

@zach1221
Copy link
Collaborator Author

I have this setup for Derecho in rt.sh.
elif [[ $MACHINE_ID = derecho ]]; then

export PATH=/glade/work/epicufsrt/contrib/derecho/rocoto/bin:$PATH
module use /glade/work/epicufsrt/contrib/spack-stack/derecho/modulefiles
module load ecflow/5.8.4
ECF_PORT=$(( $(id -u) + 1500 ))
ECFLOW_START=/glade/work/epicufsrt/contrib/spack-stack/derecho/ecflow-5.8.4/bin/ecflow_start.sh

Getting: ImportError: /glade/u/home/zshrader/miniconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /glade/work/epicufsrt/contrib/spack-stack/derecho/ecflow-5.8.4/lib/python3.10/site-packages/ecflow/ecflow.so)

@natalie-perlin
Copy link
Collaborator

These changes to rt.sh allowed a job to enter the queue (still Queued ):

elif [[ $MACHINE_ID = derecho ]]; then

 module use /glade/work/epicufsrt/contrib/derecho/rocoto/modulefiles
 module load rocoto
 module use /glade/work/epicufsrt/contrib/spack-stack/derecho/modulefiles
 module load ecflow/5.8.4
 module unload ncarcompilers
 module use /glade/work/epicufsrt/contrib/spack-stack/derecho/spack-stack-1.5.1/envs/unified-env/install/modulefiles/Core
 module load stack-intel/2021.10.0
 module load stack-python/3.10.8
 ECFLOW_START=/glade/work/epicufsrt/contrib/spack-stack/derecho/ecflow-5.8.4/bin/ecflow_start.sh

I could also suggest to modify the following in the ./modulefiles/ufs_derecho.intel.lua:

Change the line:
prepend_path("MODULEPATH", "/lustre/desc1/scratch/epicufsrt/contrib/modulefiles")
to
prepend_path("MODULEPATH", "/glade/work/epicufsrt/contrib/spack-stack/derecho/modulefiles")

@natalie-perlin
Copy link
Collaborator

@zach1221 - with these changes, a test cpld_control_p8_mixedmode_intel passed successfully (only this test is set in rt.conf).
Regression test log: /glade/derecho/scratch/nperlin/UFS-WM/ufs-weather-model/tests/logs/RegressionTests_derecho.log

@zach1221
Copy link
Collaborator Author

/glade/derecho/scratch/nperlin/UFS-WM/ufs-weather-model/tests

Seems to be working @natalie-perlin . Thank you,

@zach1221
Copy link
Collaborator Author

Full rt.conf has been run on Derecho and logs pushed to PR #2061 as well as the changes Natalie recommended committed to the PR. I have also run regional_atmaq_debug_intel, cpld_control_p8_faster_intel, cpld_bmark_p8, cpld_restart_bmark_p8 and conus13km_debug_qr separately on Derecho to confirm they're working. I'll update the baselines on Derecho and enable those cases in an upcoming WM PR.

zach1221 added a commit to climbfuji/ufs-weather-model that referenced this issue Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

4 participants