update MOM6 to GFDL 20210224 main branch commit #439

jiandewang · 2021-02-25T05:41:41Z

Description

GFDL updated their main branch on 20210224 which is their step 1 of FMS2 related code changes. No answer changes is expected. "mom6_files.cmake" needs modification to reflect the files being added and removed in framework directory

Issue(s) addressed

Issue #438
MOM6 issue 55 (NOAA-EMC/MOM6#55)

Testing

run ufs-weather-model with updated MOM6 and modified "mom6_files.cmake" on hera, orion and dell-P3

Regression tests passed on:

cheyenne.gnu (auto-rt)
cheyenne.intel (auto-rt)
wcoss_cray
wcoss-dell
hera.gnu
hera.intel
jet.intel
gaea.intel
orion.intel

Dependencies

MOM6 PR 56 (NOAA-EMC/MOM6#56)

modify mom6_files.cmake to reflect the files being added and removed in framework directory

DeniseWorthen · 2021-02-26T13:19:40Z

@jiandewang Please post the RegressionTests_platform.logs from orion,hera and dell-p3 which you have already run. You should be able to do hera-gnu and wcoss-cray also, correct?

I will do cheyenne, jet and gaea. I will post the logs to a directory on Hera which you can then copy and commit from your own checkout.

jiandewang · 2021-02-26T14:09:46Z

@DeniseWorthen thanks, will post log files from my part when jobs finished

BrianCurtis-NOAA · 2021-02-26T16:20:46Z

Log Name:rt_auto_hera.intel_20210226162045.log
Log Location:/scratch1/NCEPDEV/nems/Brian.Curtis/git2/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

jiandewang · 2021-02-26T16:37:54Z

@BrianCurtis-NOAA I saw hera job is done, inside /scratch1/NCEPDEV/nems/Brian.Curtis/git2/ufs-community/ufs-weather-model/tests/auto, is "rt_auto_hera.intel_20210226162045.log" the file that I need to added and committed into my branch ? and there will be no more "RegressionTests_hera.intel.log" file, right ?

BrianCurtis-NOAA · 2021-02-26T16:41:43Z

Log Name:rt_auto_gaea.intel_20210226114143.log
Log Location:/lustre/f2/pdata/ncep/Brian.Curtis/git/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

DusanJovic-NOAA · 2021-02-26T16:42:57Z

@BrianCurtis-NOAA I saw hera job is done, inside /scratch1/NCEPDEV/nems/Brian.Curtis/git2/ufs-community/ufs-weather-model/tests/auto, is "rt_auto_hera.intel_20210226162045.log" the file that I need to added and committed into my branch ? and there will be no more "RegressionTests_hera.intel.log" file, right ?

No. Log file is already committed.

DusanJovic-NOAA · 2021-02-26T16:46:10Z

Log Name:rt_auto_gaea.intel_20210226114143.log
Log Location:/lustre/f2/pdata/ncep/Brian.Curtis/git/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

Gaea test failed. RT-auto should not commit log file in case regression test fails.

BrianCurtis-NOAA · 2021-02-26T17:32:18Z

Log Name:rt_auto_jet.intel_20210226173218.log
Log Location:/mnt/lfs4/HFIP/h-nems/Brian.Curtis/git/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

BrianCurtis-NOAA · 2021-02-26T17:34:23Z

@DusanJovic-NOAA What's wrong with it sending logs if it fails, won't it just be overwritten later with a successful one?

climbfuji · 2021-02-26T17:40:25Z

@DusanJovic-NOAA What's wrong with it sending logs if it fails, won't it just be overwritten later with a successful one?

Does it produce a red flag somewhere, indicating that it failed?

DusanJovic-NOAA · 2021-02-26T17:43:31Z

@DusanJovic-NOAA What's wrong with it sending logs if it fails, won't it just be overwritten later with a successful one?

What am I supposed to do with that log file? It's misleading. One has to manually check each of these log files, scroll to the end and verify that the test didn't fail. This just creates more work.

BrianCurtis-NOAA · 2021-02-26T17:45:31Z

@DusanJovic-NOAA What's wrong with it sending logs if it fails, won't it just be overwritten later with a successful one?

Does it produce a red flag somewhere, indicating that it failed?

Even with a failed test, rt.sh does not return 1 at the end. The code checks for returncode != 0 . The code looks for that to tell the logger to get the stdout and stderr into the file.

DusanJovic-NOAA · 2021-02-26T17:50:30Z

@DusanJovic-NOAA What's wrong with it sending logs if it fails, won't it just be overwritten later with a successful one?

Does it produce a red flag somewhere, indicating that it failed?

Even with a failed test, rt.sh does not return 1 at the end. The code checks for returncode != 0 . The code looks for that to tell the logger to get the stdout and stderr into the file.

Because rt.sh (script) didn't fail. It successfully finished what it is supposed to do. Parse rt.conf and run a sequence of COMPILE and RUN jobs. rt.sh will return non-zero exit code if itself fails to run, but not if tests fail.

BrianCurtis-NOAA · 2021-02-26T17:51:51Z

Log Name:rt_auto_orion.intel_20210226115149.log
Log Location:/work/noaa/nems/bcurtis/git/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

BrianCurtis-NOAA · 2021-02-26T17:58:23Z

Because rt.sh (script) didn't fail. It successfully finished what it is supposed to do. Parse rt.conf and run a sequence of COMPILE and RUN jobs. rt.sh will return non-zero exit code if itself fails to run, but not if tests fail.

I figured it was a nice start to at least do the work of running it without ever having to touch the HPC, so a good start is that the FAILED text is in the log file which tells the PR owner they should go dive into why. The automated stuff has a LONG way to go to be set it and forget it. When all is said and done, it is def the goal for "set it and forget it".

jiandewang · 2021-02-26T19:24:55Z

@jun and all: at this stage RT passed for jet, orion, hera, cheyenna and cray, dell-p3 is still running. But is having issues on gaea, on gaea all non-debug style of coupled runs failed when do cmp with baseline.
What shall we do here ? I don't have project account on gaea.

climbfuji · 2021-03-04T19:31:33Z

Log Name:rt_auto_20210304110005.log
Log Location:/glade/work/heinzell/fv3/ufs-weather-model/auto-rt/control-20210226-new/tests/auto
Logs are kept for one month

BrianCurtis-NOAA · 2021-03-04T19:54:33Z

Log Name:rt_auto_20210304093014.log
Log Location:/work/noaa/nems/bcurtis/git/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

@BrianCurtis-NOAA can you chmod for the above ?

/work/noaa/nems/bcurtis/test/579823211/20210304093017/ufs-weather-model in case it's not readable let me know.

jiandewang · 2021-03-04T20:43:07Z

Orion had another time out job, will re-submit

BrianCurtis-NOAA · 2021-03-04T20:44:54Z

Please let me run the orion test?

jiandewang · 2021-03-04T20:47:20Z

Please let me run the orion test?

@BrianCurtis-NOAA I launched the job 30s before I saw your message

jiandewang · 2021-03-04T20:48:30Z

/work/noaa/nems/bcurtis/test/579823211/20210304093017/ufs-weather-model

it is accessible now.

BrianCurtis-NOAA · 2021-03-04T20:48:36Z

Please let me run the orion test?

@BrianCurtis-NOAA I launched the job 30s before I saw your message

No problem. If it fails again I'd like to run the next one.

jiandewang · 2021-03-04T20:49:20Z

Please let me run the orion test?

@BrianCurtis-NOAA I launched the job 30s before I saw your message

No problem. If it fails again I'd like to run the next one.

hope this time it will give us good luck

DeniseWorthen · 2021-03-04T20:50:54Z

@jiandewang which test is timing out?

jiandewang · 2021-03-04T20:52:55Z

@jiandewang which test is timing out?

coupled-frictional-C192

jiandewang · 2021-03-04T20:53:48Z

@BrianCurtis-NOAA hera intel is not finished yet, just want to conform with you

DeniseWorthen · 2021-03-04T20:59:15Z

@jiandewang The cpld_controlfrac_c192 wall clock time on Hera is only ~6 minutes. If it is taking longer than 30min on Orion then something seems wrong.

jiandewang · 2021-03-04T21:01:44Z

@jiandewang The cpld_controlfrac_c192 wall clock time on Hera is only ~6 minutes. If it is taking longer than 30min on Orion then something seems wrong.

1st round run from Brian timed out with different job, I believe this is a machine issue. I had these kind of issues before.

uturuncoglu · 2021-03-04T21:02:53Z

@DeniseWorthen my experience with Orion indicates that the model performance in that particular platform is not predictable and depend on the load. Sometimes, the model hang on the FV3 initialization stage or takes longer than usual. It could be a network or disk issue but I am not sure at this point.

DeniseWorthen · 2021-03-04T21:05:12Z

@jiandewang @uturuncoglu Thanks both of you. If it was a different test that timed out previously then yes I agree this is probably a machine issue.

BrianCurtis-NOAA · 2021-03-04T21:18:14Z

Log Name:rt_auto_20210304190005.log
Log Location:/scratch1/NCEPDEV/nems/Brian.Curtis/git2/ufs-community/ufs-weather-model/tests/auto
Logs are kept for one month

BrianCurtis-NOAA · 2021-03-04T21:19:40Z

hera.intel PASSED

DeniseWorthen

Assuming there are no further issues w/ orion RTs, approve.

jiandewang · 2021-03-04T22:05:36Z

no timed out job on orion so far, only last 5 jobs in pending status, hope they don't bring bad luck

.gitmodules

jiandewang · 2021-03-04T23:06:42Z

ready for review

* upstream/develop: update MOM6 to GFDL 20210224 main branch commit (ufs-community#439) Add GNU and Cheyenne Support to Automated RT (ufs-community#444) Move Noah MP init to CCPP and update Noah MP regression tests, ice flux init bug fix in CCPP (ufs-community#425) Feature/rt automation (ufs-community#403) Update ccpp-physics. Make RRTMGP thread safe (ufs-community#418) Update regression tests from GFSv15+Thompson to GFSv16+Thompson, include "Add one regional regression test in DEBUG mode. (ufs-community#419)" (ufs-community#421) UGWP v0 v1 combined (ufs-community#396) add optional mesh in MOM6; add dz_min and min_seaice as configurable variables for coupled model (ufs-community#399) updates FMS to 2020.04.01 (ufs-community#392) Move LSM vegetation lookup tables into CCPP, clean up RUC snow cover on ice initialization (remove IPD step 2) (ufs-community#407) Update CMEPS for HAFS integration; add datm and coupled-model tests on Gaea (ufs-community#401) Remove legacy gnumake build from fv3atm and NEMS, remove legacy Python 2.7 support, rename v16beta to v16 and RT updates (ufs-community#384) MOM6 bugfixes, GFDL update, update CDMBGWD settings; fix for restart reproducibility (without waves) when USE_LA_LI2016=True, sign error on fprec passed to ocean, GFDL update, resolution dependent cdmbgwd settings (ufs-community#379) dycore options to add zero-gradient BC to reconstruct interface u/v and change dz_min as input (ufs-community#369) Update develop from NOAA-GSL: RUC ice, MYNN sfclay, stochastic land perturbations (ufs-community#386) update cpl gfsv16 tests, rrtmgp fix and bug fixes in cmeps (ufs-community#378) point fv3 to EMC develop branch (ufs-community#377) Remove IPD steps 3 and 5 (ufs-community#357) Update CMEPS (ufs-community#345) Implementation of CCPP timestep_init and timestep_final phases (ufs-community#337) Remove unnecessary SIMD instruction sets for Jet, first round of cleanup in rt.conf, initialize cld_amt to zero for regional runs (dycore) (ufs-community#353) add frac grid input, update and add additional cpld tests (ufs-community#354) Add checkpoint restarts for ufs-cpld (ufs-community#342) Update the format of rt.conf (ufs-community#349) Remove IPD (step 1) (ufs-community#331) Feature/ww3update (ufs-community#334) Replace old regional SDF with FV3_GFS_v15_thompson_mynn (ufs-community#333) Update modules with hpc-stack v1.1.0 (ufs-community#319) Regression test log for PR ufs-community#323 for jet.intel (ufs-community#336) RRTMGP and Thompson MP coupling (ufs-community#323) Add 2 new tests for DATM-MOM6-CICE6 application (ufs-community#332) Add optional bulk flux calculation in ufs-datm (ufs-community#266) Final-final GFS v16 updates / restart reproducibility bugfixes (ufs-community#325) Updates to build for JEDI linking/control, add wcoss2 (ufs-community#295) Update CICE, Move regression test input outside baseline directory (ufs-community#270) Feature/update mom6 and retain b4b results for 025x025 resolution (ufs-community#290) Update for Jet, bug fixes in running with frac_grid=T and GFDL MP, and in restarting with frac_grid=T (ufs-community#304) Updates to stochastic_physics_wrapper (ufs-community#280) Update develop from gsd/develop 2020/11/20: Unified gravity wave drag, updates to other GSL physics (ufs-community#297) Fix to allow quilting with non-factors for layout (ufs-community#250) rt update (ufs-community#261)

@gsketefian

## DESCRIPTION OF CHANGES: Modified the Jinja-formatted FV3LAM_wflow.xml template workflow to accommodate sub-hourly post-processing tasks that rely on sub-hourly FV3 output as a dependency. All changes are _additions_ to existing code and include the addition of a few keyword variables in the config.sh script. These new flags include... - SUB_HOURLY_POST: a logical flag indicating whether nor not sub-hourly post-processing is to be used - DT_SUBHOURLY_POST_MNTS: the increment in minutes to sub-divide the hour Additional post-processing tasks were added to FV3LAM_wflow.xml to account for the different FV3 output file names depending on whether sub-hourly FV3 output is used (the first FV3 output file has a different naming structure than the remaining output files). setup.sh was updated to check whether valid entries were used for these two variables and also check that DT_ATMOS divides evenly into DT_SUBHOURLY_POST_MNTS so that the FV3 output is consistent with the requested frequency of UPP output. config_defaults.sh and valid_param_vals.sh were also updated accordingly. ## TESTS CONDUCTED: Have run generate_FV3LAM_wflow.sh on a large variety of settings of SUB_HOURLY_POST and DT_SUBHOURLY_POST_MNTS. Note that setup.sh is configured such that DT_SUBHOURLY_POST_MNTS = 0 will cause SUB_HOURLY_POST to be ignored. I have successfully tested cases in which DT_ATMOS _does not_ divide evenly into DT_SUBHOURLY_POST_MNTS and when DT_SUBHOURLY_POST_MNTS is specified as anything other than a two-digit value (strings vs. open integers both work). The resulting workflows run successfully with rocotorun and output no error messages. ## ISSUE: Resolves issue #434 ## CONTRIBUTORS: @gsketefian. Contributions: 1) Fixed bug in setup.sh in the test that checks whether DT_SUBHOURLY_POST_MNTS is set to 0: should use the -eq operator instead of ==. 2) Fixed bug in the jinja XML template for rocoto (FV3LAM_wflow.xml) as follows: rearranged the post-processing tasks so that the post task is run for only the first minute of the last hour (e.g. if the forecast is 3 hours long, post is run for 3:00 but not for 3:15, 3:30, etc). 3) Ran the following 3 WE2E tests [note that tests (b) and (c) are not yet in the regional_workflow repo and will be included in a future PR]: a) **grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta**. This is without subhourly post-processing, i.e. SUB_HOURLY_POST is set to "FALSE". b) **subhourly_post**. This is with subhourly post-processing, i.e. SUB_HOURLY_POST set is set to "TRUE" (with DT_SUBHOURLY_POST_MNTS set to "12" minutes). c) **subhourly_post_ensemble_2mems**. This is with subhourly post-processing and with ensemble forecasts enabled, i.e. SUB_HOURLY_POST and DO_ENSEMBLES are both set to "TRUE" (with DT_SUBHOURLY_POST_MNTS is set to "12" minutes and NUM_ENS_MEMBERS set to "2" members). This test is run because the changes in the jinja XML template FV3LAM_wflow.xml needed to add subhourly post invovle code that executes ensemble forecasts. **All three tests were successful.** Note that this PR changes the names of the grib2 files that UPP generates such that they now always include the minutes -- regardless of whether SUB_HOURLY_POST is set to "TRUE" or "FALSE" (if set to "FALSE", the minutes are always "00"). For example, previously, the grib2 file for forecast hour 1 was named `rrfs.t00z.bgdawpf001.tm00.grib2`; henceforth, it will be named `rrfs.t00z.bgdawpf00100.tm00.grib2`.

pointing MOM6 to feature/update-MOM6-20210224 branch

ea379ef

modify mom6_files.cmake to reflect the files being added and removed in framework directory

junwang-noaa requested review from DeniseWorthen, DusanJovic-NOAA and climbfuji February 26, 2021 13:05

DeniseWorthen added the No Baseline Change No Baseline Change label Feb 26, 2021

DusanJovic-NOAA approved these changes Feb 26, 2021

View reviewed changes

DusanJovic-NOAA added Auto-RT-hera labels Feb 26, 2021

BrianCurtis-NOAA removed Auto-RT-hera labels Feb 26, 2021

Auto: Add RT Log file: tests/RegressionTests_cheyenne.intel.log skip-ci

2317b22

Auto: Add RT Log file: tests/RegressionTests_hera.intel.log skip-ci

a0063ca

DeniseWorthen approved these changes Mar 4, 2021

View reviewed changes

jiandewang added 2 commits March 4, 2021 17:53

add RegressionTests_orion.intel.log

603ebc7

switch MOM6 repo to dev/emc

a0f8969

climbfuji approved these changes Mar 4, 2021

View reviewed changes

.gitmodules Outdated Show resolved Hide resolved

remove extra empty space in .gitmodules

9126ed8

junwang-noaa merged commit 35d1897 into ufs-community:develop Mar 5, 2021

jiandewang deleted the feature/update-MOM6-20210224 branch February 28, 2023 04:03

update MOM6 to GFDL 20210224 main branch commit #439

update MOM6 to GFDL 20210224 main branch commit #439

Conversation

jiandewang commented Feb 25, 2021 • edited by junwang-noaa Loading

Description

Issue(s) addressed

Testing

Dependencies

DeniseWorthen commented Feb 26, 2021

jiandewang commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

jiandewang commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

DusanJovic-NOAA commented Feb 26, 2021

DusanJovic-NOAA commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

climbfuji commented Feb 26, 2021

DusanJovic-NOAA commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

DusanJovic-NOAA commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

BrianCurtis-NOAA commented Feb 26, 2021

jiandewang commented Feb 26, 2021

climbfuji commented Mar 4, 2021

BrianCurtis-NOAA commented Mar 4, 2021

jiandewang commented Mar 4, 2021

BrianCurtis-NOAA commented Mar 4, 2021 via email • edited Loading

jiandewang commented Mar 4, 2021

jiandewang commented Mar 4, 2021

BrianCurtis-NOAA commented Mar 4, 2021

jiandewang commented Mar 4, 2021

DeniseWorthen commented Mar 4, 2021

jiandewang commented Mar 4, 2021

jiandewang commented Mar 4, 2021

DeniseWorthen commented Mar 4, 2021

jiandewang commented Mar 4, 2021

uturuncoglu commented Mar 4, 2021

DeniseWorthen commented Mar 4, 2021

BrianCurtis-NOAA commented Mar 4, 2021

BrianCurtis-NOAA commented Mar 4, 2021

DeniseWorthen left a comment

Choose a reason for hiding this comment

jiandewang commented Mar 4, 2021

jiandewang commented Mar 4, 2021

jiandewang commented Feb 25, 2021 •

edited by junwang-noaa

Loading

BrianCurtis-NOAA commented Mar 4, 2021 via email •

edited

Loading