Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Dependencies for Regression Testing #1667

Merged
merged 90 commits into from
May 3, 2023

Conversation

zach1221
Copy link
Collaborator

@zach1221 zach1221 commented Mar 21, 2023

Description

Add PR to support running the UFS WM Regression Test script on the three NOAA Cloud HPCs (GCP, Azure and AWS).

To run the UFS WM Regression Test suite on the NOAA Cloud platforms, RT related scripts, module and fv3 slurm job files need to be updated with Cloud platform related dependencies and variables.

Following these updates, rt.sh should run any test from the rt.conf suite.

Top of commit queue on: TBD

Input data additions/changes

  • No changes are expected to input data.
  • There will be new input data.
  • Input data will be updated.

Anticipated changes to regression tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Combined with PR's (If Applicable):

Commit Queue Checklist:

  • Link PR's from all sub-components involved
  • Confirm reviews completed in sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne with both Intel/GNU compilers
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Linked PR's and Issues:

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Intel
      • Hera
      • Orion
      • Jet
      • Gaea
      • Cheyenne
    • GNU
      • Hera
      • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@jkbk2004
Copy link
Collaborator

@ulmononian Can you monitor with this PR in terms of spack stack maintenance strategy on PW clouds? We need to keep a consistency of machine names between weather model and srw as well.

@zach1221
Copy link
Collaborator Author

@SadeghTabas-NOAA Here's the draft cloud PR I was referring to previously. @ulmononian Sadegh was able to build new permanent Spack-Stack modules on Parallel Works AWS, for our use in this PR. Would you be able to assist us in doing the same for GCP and Azure?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented May 3, 2023

Automated RT Failure Notification
Machine: orion
Compiler: intel
Job: RT
[RT] Repo location: /work/noaa/epic-ps/jongkim/autort/pr/1284394645/20230503073018/ufs-weather-model
Please make changes and add the following label back: orion-intel-RT

@epic-cicd-jenkins
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
[RT] Repo location: /mnt/lfs4/HFIP/hfv3gfs/role.epic/autort/pr/1284394645/20230503122210/ufs-weather-model
[RT] Error: Test control_c384gdas 028 failed in run_test failed
Please make changes and add the following label back: jet-intel-RT

tests/opnReqTest Outdated Show resolved Hide resolved
@FernandoAndrade-NOAA FernandoAndrade-NOAA merged commit ee35891 into ufs-community:develop May 3, 2023
@BrianCurtis-NOAA
Copy link
Collaborator

Didn't wait for Acorn. they passed though. Can't push to this any more.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented May 3, 2023

@BrianCurtis-NOAA I could hold my approval until acorn log come. This pr is only for pw-cloud feature. Sanity check can be enough for validation with other machines. @FernandoAndrade-NOAA FYI

@DusanJovic-NOAA
Copy link
Collaborator

I see '-B' option was added back to nccmp. Was that intended?

@zach1221
Copy link
Collaborator Author

zach1221 commented May 4, 2023

@DusanJovic-NOAA Yes, the Cheyenne admin installed nccmp/1.9.1.0 on the HPC, which has the -B option.

@DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA Yes, the Cheyenne admin installed nccmp/1.9.1.0 on the HPC, which has the -B option.

Excellent. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jenkins-ci Jenkins CI: ORT build/test on docker container No Baseline Change No Baseline Change Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants