Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Bugfix for Issue #867 (when QUILTING is set to false) #869

Merged
merged 8 commits into from
Jul 26, 2023

Conversation

gsketefian
Copy link
Collaborator

@gsketefian gsketefian commented Jul 21, 2023

DESCRIPTION OF CHANGES:

When QUILTING is set to false, the run_fcst task fails because some of the variables needed in the jinja template file parm/model_configure remain undefined in the call to ush/create_model_configure_file.py (see Issue #867). This PR fixes that bug and adds a WE2E test for the QUILTING: false case.

Note that when QUILTING is set to false, the UFS Weather Model generates output files named fv3_history2d.nc and fv3_history.nc that contain data on the native grid, not the write-component grid. Thus, the run_post and subsequent tasks that depend on Weather Model output on the write-component grid cannot be included in the workflow. For this reason, the new WE2E test that this PR adds includes tasks only up to run_fcst.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

Ran the new WE2E test grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_quilt_off as well as the set of fundamental tests on Hera with Intel. All passed.

DEPENDENCIES:

None.

DOCUMENTATION:

ISSUE:

Resolves Issue #867.

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few requests

@@ -0,0 +1,33 @@
metadata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test would fit better under the "wflow_features" directory, maybe a name like "config.quilting_off.yaml". If you'd like to keep it here I'd suggest a symlink in that directory, since this is a major feature that's being tested.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich Moved and renamed test to quilting_false.

# outputting on the write-component grid) to default values. If this is not
# done, the run_fcst task will fail with a "variables are not provided" message.
#
settings.update(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as an "else" statement under the following "if QUILTING" block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich Moved into an else-statement.

@@ -3,6 +3,7 @@ get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_quilt_off
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test should also be added to the "comprehensive" list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich I renamed it here (to quilting_false) and added it to comprehensive.

…uite_RAP_quilt_off" to wflow_features directory and rename it "quilting_false"; add this test to the list of comprehensive tests (and do the name change in coverage.hera.gnu.com).
…n else-statement instead of before the if-statement (to address Mike K's comments).
@gsketefian
Copy link
Collaborator Author

@mkavulich Thanks for the review. I addressed your comments and reran the new test (quilting_false) and all the fundamental tests, and they all passed.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gsketefian , looks good!

@gsketefian
Copy link
Collaborator Author

@MichaelLueken Just merged latest develop into my branch. Ready to test.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsketefian These changes look good to me!

The quilting_false and Hera GNU WE2E coverage tests were run and all tests successfully passed.

Approving this PR now.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Jul 25, 2023
@MichaelLueken
Copy link
Collaborator

Manual run of Cheyenne Intel WE2E coverage tests passed on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE      COMPLETE              13.14
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              31.42
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16                COMPLETE              19.46
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR           COMPLETE              25.68
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta    COMPLETE               8.79
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR                COMPLETE              16.60
pregen_grid_orog_sfc_climo                                         COMPLETE               8.07
specify_template_filenames                                         COMPLETE               8.23
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             131.39

Manual run of Cheyenne GNU WE2E coverage tests passed on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16      COMPLETE              10.65
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta      COMPLETE              29.69
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp  COMPLETE              50.88
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              14.83
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR             COMPLETE              20.36
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16   COMPLETE              11.56
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP                 COMPLETE             146.07
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16               COMPLETE              19.96
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS                              COMPLETE               6.49
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             310.49

Awaiting completion of automated Jenkins tests now.

@MichaelLueken
Copy link
Collaborator

The Jenkins tests are failing on Hera (both Intel and GNU), as well as Jet.

For Hera Intel, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 test is failing in the run_fcst step, with the following error:

FATAL from PE 5: compute_qs: saturation vapor pressure table overflow, nbad= 1

The expt_dirs for this test can be found - /scratch1/NCEPDEV/stmp2/role.epic/jenkins/workspace/fs-srweather-app_pipeline_PR-869__2/expt_dirs/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2

and the nco_dirs can be found - /scratch1/NCEPDEV/stmp2/role.epic/jenkins/workspace/fs-srweather-app_pipeline_PR-869__2/nco_dirs/tmp/run_fcst_mem000.id_1690318816_2019070100

For Hera GNU, the get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS test failed to pull the necessary ICs and LBCs from NOMADS.

The expt_dirs for this test can be found - /scratch1/NCEPDEV/stmp2/role.epic/jenkins/workspace/fs-srweather-app_pipeline_PR-869/expt_dirs/get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS

For Jet, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 test failed in run_fcst with the following error:

FATAL from PE 1: compute_qs: saturation vapor pressure table overflow, nbad= 1

The expt_dirs for this test can be found - /mnt/lfs4/HFIP/hfv3gfs/role.epic/jenkins/workspace/fs-srweather-app_pipeline_PR-869/expt_dirs/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2

I'm currently in the process of manually running these tests on Jet and Hera and will report back with how they go.

@MichaelLueken
Copy link
Collaborator

The manual test of this work on Hera GNU successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200         COMPLETE              18.71
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS                             COMPLETE              43.33
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR             COMPLETE             236.87
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp  COMPLETE              17.76
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta      COMPLETE              36.19
quilting_false                                                     COMPLETE              14.07
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0              COMPLETE              23.22
GST_release_public_v1                                              COMPLETE              53.82
MET_verification_only_vx                                           COMPLETE               0.16
MET_ensemble_verification_only_vx_time_lag                         COMPLETE               3.90
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16      COMPLETE             342.47
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             790.50

@MichaelLueken
Copy link
Collaborator

The manual testing of this work on Hera Intel successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200          COMPLETE               6.59
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2  COMPLETE             772.62
get_from_HPSS_ics_HRRR_lbcs_RAP                                    COMPLETE              14.33
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp  COMPLETE              10.40
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2        COMPLETE               7.61
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16          COMPLETE              12.64
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP                 COMPLETE              10.24
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2        COMPLETE               7.33
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2         COMPLETE             250.11
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16           COMPLETE             307.78
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR            COMPLETE             333.22
pregen_grid_orog_sfc_climo                                         COMPLETE               9.43
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1742.30

@MichaelLueken
Copy link
Collaborator

The manual testing of this work on Jet has successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community                                                          COMPLETE              19.72
custom_ESGgrid                                                     COMPLETE              18.84
custom_GFDLgrid                                                    COMPLETE              13.17
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018         COMPLETE              12.83
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h     COMPLETE              66.79
get_from_HPSS_ics_RAP_lbcs_RAP                                     COMPLETE              16.69
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR                 COMPLETE             186.07
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              46.49
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2        COMPLETE               9.55
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta       COMPLETE             563.46
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR       COMPLETE              12.83
process_obs                                                        COMPLETE               0.92
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             967.36

Moving forward with merging this work now.

@MichaelLueken MichaelLueken merged commit eb90788 into ufs-community:develop Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting QUILTING: false causes the run_fcst task to fail
4 participants