Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Deprecate the CYCL_HRS configuration setting. #411

Merged

Conversation

christinaholtNOAA
Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA commented Oct 13, 2022

DESCRIPTION OF CHANGES:

Remove the use of CYCL_HRS as an array that sets the cycles to run in a given day in favor of using only the cycle frequency (CYCL_FREQ), along with the first and last forecast cycle defined in YYYYMMDDHH format.

The CYCL_HRS variable was mainly only used in the configuration layer (along with a few wrapper scripts) to allow users to configure their full workflow experiment.

By making this change, quite a bit of complexity is removed.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

I ran all the tests on Hera with intel, finding the documented failures still hold. Also finding that Hera's job number limit inhibited me from completing a handful of ensemble workflows. Those workflows did, however, run several tasks successfully before quitting. I did not complete them since it was obvious they did what they were supposed to as it relates to the CYCL_HRS array.

DEPENDENCIES:

I made relevant changes in the docs.

ISSUE:

None

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

Changes were made to all config files and scripts to use FIRST and LAST
cycle definitions to accept the cycle HH, and frequency will start from
those for all relevant computation.
Copy link
Collaborator Author

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments below.

I tested all WE2E configurations successfully, so you can likely assume the config file changes (the bulk of the files changed in this PR) were okay if you don't want to slog through them all in the review.

Cheers!

@@ -1,30 +1,30 @@
Test Name,PREDEF_GRID_NAME,CCPP_PHYS_SUITE,EXTRN_MDL_NAME_ICS,EXTRN_MDL_NAME_LBCS,DATES (UTC),CYCL_HRS (UTC),FCST_LEN_HRS (hrs),DT_ATMOS (s)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if I should have formatted this one differently with this change. It wasn't clear to me whether this file was autogenerated, or maintained manually.

{%- set cdate_first=date_first_cycl ~ c ~ "00" -%}
{%- set cdate_last=date_last_cycl ~ c ~ "00" -%}
{{- cdate_first ~ " " ~ cdate_last ~ " " ~ cycl_freq -}}
{{- date_first_cycl ~ " " ~ date_last_cycl ~ " " ~ cycl_freq -}}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra "minute" zeros here are added in the generate workflow script.

@@ -601,7 +601,7 @@ if [ "${USE_FVCOM}" = "TRUE" ]; then
#Format for fvcom_time: YYYY-MM-DDTHH:00:00.000000
fvcom_exec_fn="fvcom_to_FV3"
fvcom_exec_fp="$EXECdir/${fvcom_exec_fn}"
fvcom_time="${DATE_FIRST_CYCL:0:4}-${DATE_FIRST_CYCL:4:2}-${DATE_FIRST_CYCL:6:2}T${CYCL_HRS[0]}:00:00.000000"
fvcom_time="${DATE_FIRST_CYCL:0:4}-${DATE_FIRST_CYCL:4:2}-${DATE_FIRST_CYCL:6:2}T${DATE_FIRST_CYCL:8:2}:00:00.000000"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why this file name would be generated from the first cycle time of a workflow. Shouldn't it be of a given CDATE since the first cycle of an experiment is arbitrary, especially for real-time workflows?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmwright526, since this is FVCOM-related, can you comment?

auto_file=${scrfunc_dir}/machine_suites/${test_type}.${machine}
if [ ! -f ${auto_file} ]; then
auto_file=${scrfunc_dir}/machine_suites/${test_type}
fi
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this feature so that I never have to look up this find statement again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinaholtNOAA It maybe better to modify run_WE2E_tests.sh instead, which already collects all test files in avail_WE2E_test_names for validation purposes. We can remove the requirement that a tests_file be passed, and if it is not assume all tests. That way those who want to use run_WE2E_tests.sh directly can also benefit from the new all tests feature.

CYCL_HRS:
- 0
DATE_FIRST_CYCL: '2019070100'
DATE_LAST_CYCL: '2019070100'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the config files are changed in this way. Removing the CYCL_HRS setting and creating the associated start and stop times based on what is here. Occasionally, the cycle frequency (INCR_CYCL_FREQ) is updated to reflect the actual cycling frequency for those cases where CYCL_HRS = [0, 12], for example.

"cdate_first_cycl": cdate_first_cycl,
"cycl_hrs": cycl_hrs_str,
"date_first_cycl": date_to_str(DATE_FIRST_CYCL, format="%Y%m%d%H00"),
"date_last_cycl": date_to_str(DATE_LAST_CYCL, format="%Y%m%d%H00"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the trailing zeros are added (mentioned above).

Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I like about this new approach is that in the past, I have had difficulty maintaining leading 0s in CYCLE_HRS, as you can see in the yaml config files, and this has caused subtle bugs. Now that it is part of the date, there is no doubt it will have leading zeros for single digit cycles.

ush/generate_FV3LAM_wflow.py Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
)
self.assertEqual(
cdates, ["2022010106", "2022010112", "2022010306", "2022010312"]
cdates, ["2022010106", "2022010112", "2022010118",
"2022010200", "2022010206", "2022010212"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this test to reflect the new behavior. I do not feel that this test necessarily reflects a standard use case for how folks choose to cycle their experiments. The original tested that cycles would be set for 6 and 12 UTC 2 days apart. It is much more standard for cycling capabilities to be set at regular intervals.

I am currently working on introducing changes to XML creation that should allow us to set up any number of cycledefs using any of the native Rocoto cycledef options in a more straightforward way.

I hope that the change in behavior to this script will be tolerated in the meantime.

Copy link
Collaborator

@JeffBeck-NOAA JeffBeck-NOAA Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current workflow allows a user to choose any combination of forecast cycles between 00 and 23Z to be repeated between DATE_FIRST_CYCL and DATE_LAST_CYCL to provide as much flexibility to the community as possible. While it may not be a use case in operational meteorology, it's very possible that other users in the community could find it useful for their purposes, particularly for those in academia or researchers working on model or physics improvements. For example, a developer may wish to only run during nocturnal cycle hours with short-term forecasts to test physics innovations intended to improve results at night, and have no need to run cycles during the day. Running only during nocturnal hours over many days produces statistically significant results while limiting cycles (i.e., disk space and compute time) that they don't need. Will the changes to XML creation you mention allow a user to choose any given set of forecast cycles such as how the functionality works now? Would those changes be accompanied by a user-defined variable such as "CYCL_HRS" where a set of cycles can be defined? If that's the case and the "tolerated change in behavior" is just this limitation for the time being, I think that's fine, but just wanted to make sure. @gsketefian, @mkavulich, @BenjaminBlake-NOAA, @mark-a-potts, @danielabdi-noaa, @MichaelLueken, thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new XML generation would not be setting just one start, stop, end that are defined by variables like we currently have. Instead, a user would set any number of cycledefs supported by Rocoto in their yaml config. It might look something like this. Here, I'm recreating the same cycling Daniel originally had in the test case.

  cycledefs:
    at_start:
      dates: !startstopfreq ['DATE_FIRST_CYCL', 'DATE_FIRST_CYCL', 'CYCL_FREQ']
    forecast6z:
      dates: !startstopfreq [2022010106, 2022010406, 48:00:00]
    forecast12z:
      dates: !startstopfreq [2022010112, 2022010412, 48:00:00]
    

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinaholtNOAA do you think your future XML creation changes will be ready before the code slush deadline? If so I think @JeffBeck-NOAA's point is moot since the capability will be re-introduced (just in a different way)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinaholtNOAA, @mkavulich, OK! So if a user wanted to run, say, 00, 01, 02, 03, and 04Z every day for two weeks, that configuration could be created with five different "dates:" entries for cycledefs? If that functionality comes in time for v2.1, that would be great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich I will push had to get the change in, but it will be a big PR and I will need folks to review it in a timely manner.

Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinaholtNOAA Great work! I've left some minor comments.

auto_file=${scrfunc_dir}/machine_suites/${test_type}.${machine}
if [ ! -f ${auto_file} ]; then
auto_file=${scrfunc_dir}/machine_suites/${test_type}
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinaholtNOAA It maybe better to modify run_WE2E_tests.sh instead, which already collects all test files in avail_WE2E_test_names for validation purposes. We can remove the requirement that a tests_file be passed, and if it is not assume all tests. That way those who want to use run_WE2E_tests.sh directly can also benefit from the new all tests feature.

ush/setup.py Show resolved Hide resolved
ush/generate_FV3LAM_wflow.py Show resolved Hide resolved
ush/setup.py Show resolved Hide resolved
"cdate_first_cycl": cdate_first_cycl,
"cycl_hrs": cycl_hrs_str,
"date_first_cycl": date_to_str(DATE_FIRST_CYCL, format="%Y%m%d%H00"),
"date_last_cycl": date_to_str(DATE_LAST_CYCL, format="%Y%m%d%H00"),
Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I like about this new approach is that in the past, I have had difficulty maintaining leading 0s in CYCLE_HRS, as you can see in the yaml config files, and this has caused subtle bugs. Now that it is part of the date, there is no doubt it will have leading zeros for single digit cycles.

@christinaholtNOAA christinaholtNOAA changed the title Deprecate the CYCL_HRS configuration setting. [develop] Deprecate the CYCL_HRS configuration setting. Oct 13, 2022
Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments and suggested changes to documentation, mostly looks good.

docs/UsersGuide/source/BuildRunSRW.rst Show resolved Hide resolved
docs/UsersGuide/source/Introduction.rst Outdated Show resolved Hide resolved
ush/config_defaults.yaml Outdated Show resolved Hide resolved
Comment on lines 1052 to +1059
# 2) The double quotes (which need to be escaped here, i.e. \") are needed
# so that for any experiment variables that are arrays, all the elements
# of the array are combined together and treated as a single element.
# If the experiment variable is CYCL_HRS (cycle hours) and is set to
# the array ("00" "12"), we want the value saved in the local array
# here to be a single element consisting of "00 12". Otherwise, "00"
# and "12" will be treated as separate elements, and more than one
# element would be added to the array (which would be incorrect here).
# so that for any experiment variables that are arrays, all the elements of
# the array are combined together and treated as a single element. For
# example, if a variable CYCL_HRS is set to the array ("00" "12"), we want
# the value saved in the local array here to be a single element consisting
# of "00 12". Otherwise, "00" and "12" will be treated as separate
# elements, and more than one element would be added to the array (which
# would be incorrect here).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the removal of CYCL_HRS, does that mean none of these variables are arrays anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there could still be arrays elsewhere. I am not sure though. I didn't remove the ability to set an array in the config file, though.

ush/generate_FV3LAM_wflow.py Show resolved Hide resolved
)
self.assertEqual(
cdates, ["2022010106", "2022010112", "2022010306", "2022010312"]
cdates, ["2022010106", "2022010112", "2022010118",
"2022010200", "2022010206", "2022010212"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christinaholtNOAA do you think your future XML creation changes will be ready before the code slush deadline? If so I think @JeffBeck-NOAA's point is moot since the capability will be re-introduced (just in a different way)

Copy link
Collaborator Author

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the run_WE2E_tests.sh as @danielabdi-noaa suggested, and made mods from @mkavulich.

I tested run_WE2E_tests.sh manually by passing it an argument that was expected to work for each of the new test-defining flags it now accepts.

)
self.assertEqual(
cdates, ["2022010106", "2022010112", "2022010306", "2022010312"]
cdates, ["2022010106", "2022010112", "2022010118",
"2022010200", "2022010206", "2022010212"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich I will push had to get the change in, but it will be a big PR and I will need folks to review it in a timely manner.

@christinaholtNOAA christinaholtNOAA added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 13, 2022
@venitahagerty venitahagerty removed ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel labels Oct 13, 2022
@gsketefian
Copy link
Collaborator

@christinaholtNOAA What are the changes you are proposing for the XML? I have a lot of changes there too for vx and am wondering how much things will conflict. Is it still a jinja template?

@venitahagerty
Copy link
Collaborator

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1085374136/20221013192015/ufs-srweather-app

  • fail 'Build hera intel FAILED'
  • echo -e '\nBuild hera intel FAILED\n'
    Build failed
    If test failed, please make changes and add the following label back:
    ci-hera-intel-WE

@MichaelLueken
Copy link
Collaborator

@christinaholtNOAA The HPC-stack was updated on Hera about two hours ago. This update is causing the failure on Hera to build the SRW currently. Work is being done to correct this issue.

@venitahagerty
Copy link
Collaborator

venitahagerty commented Oct 13, 2022

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1085374136/20221013192017/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 9 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
Experiment Succeeded on jet: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta

script:
tests_file
test_name
test_type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very flexible indeed! Thanks for addressing my suggestion.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Oct 14, 2022
@MichaelLueken
Copy link
Collaborator

@christinaholtNOAA PR #417 has been merged. The SRW once again builds and runs on Hera. If you would like to update this PR and PR #412 with the latest develop, the Hera CI tests should work once again. Having said that, since you have successfully built and run this PR on Hera before the update to the stack (all tests, not just fundamental), I'm fine with merging this work once the current Jenkins CI tests complete.

@christinaholtNOAA
Copy link
Collaborator Author

@MichaelLueken I'm ready for this one whenever you are. You want to push the button when the time comes?

@MichaelLueken
Copy link
Collaborator

@christinaholtNOAA Sure! The Jenkins CI pipeline is wrapping up the Cheyenne tests and I have manually submitted the Hera tests, which have passed. Once the Jenkins CI tests are complete, I will merge.

@MichaelLueken
Copy link
Collaborator

The tests are now complete, so I will merge these changes now.

@MichaelLueken MichaelLueken merged commit 325baf1 into ufs-community:develop Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants