Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtc/develop: final PR before merging to develop or master (support for hera.gnu, change rt.sh logic for parsing rt config files) #28

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Mar 16, 2020

This PR:

  • adds support for hera.gnu to the gmake and cmake build systems; note: as a temporary measure, the hera.gnu config uses the UFS public release NCEPLIBS, which are incompatible with the soon-to-be-phased-out dynamic CCPP build
  • uses compiler-specific regression test baseline directories on hera (same as on cheyenne)
  • modifies rt.sh so that COMPILE and APPBUILD lines in rt config files are executed on all machines (similar to what is already done for RUN lines)

Associated PRs:

NOAA-EMC/NEMS#43
NCAR/ccpp-physics#409
NCAR/fv3atm#31
#28

For regression testing info, see below.

…odify rt.sh so that COMPILE and APPBUILD lines without a machine name are executed on all machines (same as RUN lines), change rt_gnu.conf accordingly
@climbfuji
Copy link
Collaborator Author

climbfuji commented Mar 16, 2020

First, the ccpp-physics changes from @shansun6 and the Thompson kdt-removal were merged, as well as the fv3atm changes from @shansun6 and the bugfix to the Thompson radar reflectivity reset logic. Then, regression tests were run against the existing baseline on hera.intel. All tests passed except

fv3_gfdlmprad_noahmp
fv3_gfdlmprad_noahmp
fv3_ccpp_gfdlmprad_noahmp
fv3_ccpp_gfdlmprad_noahmp

Example IPD PROD (but the story is the same for IPD REPRO, CCPP REPRO, CCPP PROD; and notably the new IPD REPRO = CCPP REPRO):

bash-4.2$ cat log_hera.intel/rt_008_fv3_gfdlmprad_noahmp.log

baseline dir = /scratch1/BMC/gmtb/ufs-weather-model/RT/NEMSfv3gfs/develop-20200317/INTEL/fv3_gfdlmprad_noahmp
working dir  = /scratch1/BMC/gmtb/Dom.Heinzeller/FV3_RT/rt_271115/fv3_gfdlmprad_noahmp
Checking test 008 fv3_gfdlmprad_noahmp results ....
 Comparing atmos_4xdaily.tile1.nc .........OK
 Comparing atmos_4xdaily.tile2.nc .........OK
 Comparing atmos_4xdaily.tile3.nc .........OK
 Comparing atmos_4xdaily.tile4.nc .........OK
 Comparing atmos_4xdaily.tile5.nc .........OK
 Comparing atmos_4xdaily.tile6.nc .........OK
 Comparing phyf000.nemsio .........OK
 Comparing phyf024.nemsio .........OK
 Comparing dynf000.nemsio .........OK
 Comparing dynf024.nemsio .........OK
 Comparing RESTART/coupler.res .........OK
 Comparing RESTART/fv_core.res.nc .........OK
 Comparing RESTART/fv_core.res.tile1.nc .........OK
 Comparing RESTART/fv_core.res.tile2.nc .........OK
 Comparing RESTART/fv_core.res.tile3.nc .........OK
 Comparing RESTART/fv_core.res.tile4.nc .........OK
 Comparing RESTART/fv_core.res.tile5.nc .........OK
 Comparing RESTART/fv_core.res.tile6.nc .........OK
 Comparing RESTART/fv_srf_wnd.res.tile1.nc .........OK
 Comparing RESTART/fv_srf_wnd.res.tile2.nc .........OK
 Comparing RESTART/fv_srf_wnd.res.tile3.nc .........OK
 Comparing RESTART/fv_srf_wnd.res.tile4.nc .........OK
 Comparing RESTART/fv_srf_wnd.res.tile5.nc .........OK
 Comparing RESTART/fv_srf_wnd.res.tile6.nc .........OK
 Comparing RESTART/fv_tracer.res.tile1.nc .........OK
 Comparing RESTART/fv_tracer.res.tile2.nc .........OK
 Comparing RESTART/fv_tracer.res.tile3.nc .........OK
 Comparing RESTART/fv_tracer.res.tile4.nc .........OK
 Comparing RESTART/fv_tracer.res.tile5.nc .........OK
 Comparing RESTART/fv_tracer.res.tile6.nc .........OK
 Comparing RESTART/sfc_data.tile1.nc ............NOT OK
 Comparing RESTART/sfc_data.tile2.nc ............NOT OK
 Comparing RESTART/sfc_data.tile3.nc ............NOT OK
 Comparing RESTART/sfc_data.tile4.nc ............NOT OK
 Comparing RESTART/sfc_data.tile5.nc ............NOT OK
 Comparing RESTART/sfc_data.tile6.nc ............NOT OK
 Comparing RESTART/phy_data.tile1.nc .........OK
 Comparing RESTART/phy_data.tile2.nc .........OK
 Comparing RESTART/phy_data.tile3.nc .........OK
 Comparing RESTART/phy_data.tile4.nc .........OK
 Comparing RESTART/phy_data.tile5.nc .........OK
 Comparing RESTART/phy_data.tile6.nc .........OK
Test 008 fv3_gfdlmprad_noahmp FAIL

[Dom.Heinzeller@hfe11 FV3_RT]$ ncdump /scratch1/BMC/gmtb/ufs-weather-model/RT/NEMSfv3gfs/develop-20200317/INTEL/fv3_gfdlmprad_noahmp/RESTART/sfc_data.tile1.nc > ref_sfc_data.tile1.txt
[Dom.Heinzeller@hfe11 FV3_RT]$ ncdump /scratch1/BMC/gmtb/Dom.Heinzeller/FV3_RT/rt_271115/fv3_gfdlmprad_noahmp/RESTART/sfc_data.tile1.nc > tst_sfc_data.tile1.txt
[Dom.Heinzeller@hfe11 FV3_RT]$ diff -W150 -y ref_sfc_data.tile1.txt tst_sfc_data.tile1.txt |less

...
        double snowxy(Time, yaxis_1, xaxis_1) ;                                         double snowxy(Time, yaxis_1, xaxis_1) ;
                snowxy:long_name = "snowxy" ;                                                   snowxy:long_name = "snowxy" ;
                snowxy:units = "none" ;                                                         snowxy:units = "none" ;
                snowxy:checksum = "5284299D9242C102" ;                    |                     snowxy:checksum = "95C3D0E421161808" ;
...

All NoahMP-specific quantities *xy have this checksum difference, but the actual values printed in the ncdump output further down are identical.

This is the case for tiles 1,2,4,5, but not for tiles 3,6 (the poles). For the latter two, there are actual differences in the NoahMP-specific surface variables, but all other variables (2D surface fields and 3D atmospheric variables) are identical after 24h integration (including their checksums).

It was decided to accept these changes and proceed with the PR. Consequently, a new baseline was created before running the regression tests against it (see below).

@climbfuji climbfuji force-pushed the final_pr_before_merging_to_develop_or_master_20200313 branch from 734a7ae to befa3e8 Compare March 16, 2020 14:59
@climbfuji climbfuji marked this pull request as ready for review March 17, 2020 02:33
@climbfuji
Copy link
Collaborator Author

Regression testing on hera.intel against the newly created baseline: all tests pass.

Regression test logs for creating the new baseline:

RegressionTests_hera.intel.log
Compile_hera.intel.log
rt_full_create.log

Regression test logs for verifying against it:

Compile_hera.intel.log
RegressionTests_hera.intel.log
rt_full_verify.log

@climbfuji
Copy link
Collaborator Author

Regression test logs for hera.gnu: there is a problem with all GFDLMP tests - they hang indefinitely. The control (Zhao-Carr MP) runs seem to pass (both of them are in DEBUG mode, and the EMC version of the DEBUG mode tests don't check for outputs). In any case, I created a new/empty baseline for these two tests. Most notably, all regression test compile commands in rt_gnu.conf execute without problem on hera and the runs kick off, which means that the skeleton is in place for setting up Tier-1 platform with GNU.

Create:

Compile_hera.gnu.log
RegressionTests_hera.gnu.log
rt_gnu_create.log

Verify:

Compile_hera.gnu.log
RegressionTests_hera.gnu.log
rt_gnu_verify.log

@climbfuji
Copy link
Collaborator Author

@shansun6 @SMoorthi-emc @junwang-noaa @DusanJovic-NOAA FYI - I have left out the levr bugfix for the last updates to NCAR dtc/develop before creating the PR to EMC develop / NCAR master. We need to address the levr issue in parallel in order to not delay the commit any further. See associated PRs listed above, and the description in this PR (see also the hera.gnu part).

Copy link
Collaborator

@grantfirl grantfirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why it's OK that the GFDLMP tests hang indefinitely with GNU compilers. I'm guessing this will be addressed in a followup PR at some point?

@climbfuji
Copy link
Collaborator Author

I don't understand why it's OK that the GFDLMP tests hang indefinitely with GNU compilers. I'm guessing this will be addressed in a followup PR at some point?

Yes, absolutely. This PR only provides the capability to compile and launch the tests on hera. We need to have a Tier-1 platform with GNU compiler support in the near future to catch problems right away.

@climbfuji climbfuji merged commit b02442b into NCAR:dtc/develop Mar 17, 2020
SamuelTrahanNOAA pushed a commit to SamuelTrahanNOAA/ufs-weather-model that referenced this pull request Sep 24, 2020
Option to roll back Thompson MP to WRFv3.8.1 (RAPv5/HRRRv4), add stochastic perturbations code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants