Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor updates for FV3 "the dycore" and run templates, parm/field_table/* updated for RTS #870

Merged
merged 24 commits into from
Oct 19, 2021

Conversation

bensonr
Copy link
Contributor

@bensonr bensonr commented Oct 15, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

Description

This PR brings in RTS updates from @junwang-noaa to fully use the FIELD_TABLE macro. In addition the field_tables in tests/parm/field_table/ have been modified to provide correct initialization values. Some of the run templates in tests/fv3_conf have been updated to reflect the FIELD_TABLE changes. FInally, minor bugfixes and updates for FV3 "the dycore" are included to coincide with the field_table updates.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • CI. skip due to DATM submodule related changes in CI

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first?
If so add the "waiting for other repos" label and list the upstream PRs

@junwang-noaa
Copy link
Collaborator

@bensonr would you please sync your ufs-weather-model branch with the latest develop branch and run RT on gaea? After that we will run RT on other platforms. Thanks

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: RT
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/759311817/20211018183012/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_27255
Test control_atm_aerosols 092 failed in check_result failed
Test control_atm_aerosols 092 failed in run_test failed
Test control_thompson_extdiag_debug 066 failed failed
Test control_thompson_extdiag_debug 066 failed in run_test failed
Test control_c384gdas_wav 091 failed failed
Test control_c384gdas_wav 091 failed in run_test failed
Please make changes and add the following label back:
hera-intel-RT

@climbfuji
Copy link
Collaborator

/glade/scratch/dtcufsrt/autort/tests/auto/pr/759311817/20211018124512/ufs-weather-model

Not sure if this test timed out or not. It looks like the model ran to completion, but the post task was still doing something when the run aborted. Will recover the log file, rerun the failed tests and upload if successful.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/759311817/20211018183014/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_263947
Test cpld_control_c192_p7 005 failed failed
Test cpld_control_c192_p7 005 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

@climbfuji
Copy link
Collaborator

/glade/scratch/dtcufsrt/autort/tests/auto/pr/759311817/20211018124512/ufs-weather-model

Not sure if this test timed out or not. It looks like the model ran to completion, but the post task was still doing something when the run aborted. Will recover the log file, rerun the failed tests and upload if successful.

I pushed the full regression test log for Cheyenne/GNU.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Oct 18, 2021 via email

@DeniseWorthen
Copy link
Collaborator

jet was a time-out. I will rerun.

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup work. I note the following:

  1. We should do the same with diag_table - use version-controller files from the ufs-weather-model repository for the regression tests
  2. When a new input data set is created in future pull requests, remove all field_table (and diag_table if (1) has been completed by then) from the new input data set.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: RT
Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/759311817/20211018183019/ufs-weather-model
Please manually delete: /lustre/f2/scratch/emc.nemspara/FV3_RT/rt_27605
Test cpld_decomp_p7 004 failed failed
Test cpld_decomp_p7 004 failed in run_test failed
Please make changes and add the following label back:
gaea-intel-RT

@climbfuji
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: RT
Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/759311817/20211018183019/ufs-weather-model
Please manually delete: /lustre/f2/scratch/emc.nemspara/FV3_RT/rt_27605
Test cpld_decomp_p7 004 failed failed
Test cpld_decomp_p7 004 failed in run_test failed
Please make changes and add the following label back:
gaea-intel-RT

What happened here?

@DeniseWorthen
Copy link
Collaborator

Gaea was a time out for that one test. I was going to re-run.

@bensonr
Copy link
Contributor Author

bensonr commented Oct 19, 2021

gaea is "sick" and been experiencing performance issues. ORNL is planning to apply some fixes this week with the hope of addressing the Lustre issues that have been randomly plaguing the system for over two months.

@climbfuji
Copy link
Collaborator

climbfuji commented Oct 19, 2021

gaea is "sick" and been experiencing performance issues. ORNL is planning to apply some fixes this week with the hope of addressing the Lustre issues that have been randomly plaguing the system for over two months.

Time for gaea to get vaccinated ...

@climbfuji
Copy link
Collaborator

fv3atm submodule pointer is correct, will merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change
Projects
None yet
7 participants