Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fixes: kchunk3d ignored, hailwat uninitialized in dycore, tile_num wrong for nests #2201

Merged
merged 18 commits into from
Apr 15, 2024

Conversation

SamuelTrahanNOAA
Copy link
Collaborator

@SamuelTrahanNOAA SamuelTrahanNOAA commented Mar 21, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

Fixes these bugs:

  1. Fix from @DusanJovic-NOAA wherein the kchunk3d setting in model_configure was ignored. This caused an abort due to a negative index in an MPI call on some platforms. This my have been due to a 32-bit integer wraparound, but we cannot confirm that.
  2. A hailwat variable was uninitialized in the FV3 dynamical core. Now it is set to the hailwat tracer index.
  3. The tile_num sent to CCPP in FV3 was wrong for the nest because it was the index of the tile in the mosaic (index 1) instead of the "global tile number" (index 7). This is corrected by having the dynamical core pass the "global tile number" up to the model.

No answers should change.

Commit Message:

* UFSWM - None.
  * FV3 - Write component will use kchunk3d. Model init sends the right tile number to CCPP.
    * atmos_cubed_sphere - Initialize the hailwat variable. Pass global_tile index to model.

Priority:

  • Critical

Git Tracking

UFSWM:

Issues:

Note: Although #2227 is an issue in this repository, the bug is in FV3.

Sub component Pull Requests:

UFSWM Blocking Dependencies:


Changes

Regression Test Changes (Please commit test_changes.list):

  • No Baseline Changes.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@SamuelTrahanNOAA SamuelTrahanNOAA changed the title nesting bug fixes for uninitialized variable in fv3 and incorrect tile number in fv3atm bug fixes: kchunk3d ignored, hailwat uninitialized in dycore, tile_num wrong for nests Apr 11, 2024
@BrianCurtis-NOAA
Copy link
Collaborator

@SamuelTrahanNOAA EPIC wants to go with this PR next. Could you run the full suite on Hera and commit the test_changes.list please?

@SamuelTrahanNOAA
Copy link
Collaborator Author

I am rerunning regression tests now. 259 of 299 tests have completed and none have failed. I disabled job resubmission, so this means the tests are passing on the first try.

@SamuelTrahanNOAA
Copy link
Collaborator Author

Could someone please request reviews from these individuals?

@DusanJovic-NOAA @zhanglikate @kayeekayee @spanNOAA @ChristianBoyer-NOAA

They have been involved in testing the fix for the critical kchunk3d bug

@BrianCurtis-NOAA
Copy link
Collaborator

Could someone please request reviews from these individuals?

@DusanJovic-NOAA @zhanglikate @kayeekayee @spanNOAA @ChristianBoyer-NOAA

They have been involved in testing the fix for the critical kchunk3d bug

Only Dusan seems to be allowed as a requested reviewer, but the others can still give a review i believe

@SamuelTrahanNOAA
Copy link
Collaborator Author

SamuelTrahanNOAA commented Apr 11, 2024

Regression tests passed. No baseline changes.

EDIT: Regression tests passed on Hera. I didn't run them anywhere else.

@BrianCurtis-NOAA BrianCurtis-NOAA added No Baseline Change No Baseline Change Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. Priority: High labels Apr 11, 2024
Copy link

@zhanglikate zhanglikate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished my testing for more than 84 hrs using the April 2 version global workflow, which is working well.

@SamuelTrahanNOAA
Copy link
Collaborator Author

I've merged develop. Those changes were all CICE, so they should not affect this PR's changes nor the bug people are encountering. Hence, I am not rerunning regression tests unless someone asks me to do that. Code managers will run regression tests in the ordinary testing process.

@BrianCurtis-NOAA
Copy link
Collaborator

I've merged develop. Those changes were all CICE, so they should not affect this PR's changes nor the bug people are encountering. Hence, I am not rerunning regression tests unless someone asks me to do that. Code managers will run regression tests in the ordinary testing process.

So everyone knows, the intention for the full RT suite being run is not to be re-done unless code changes related to the bug/feature being added/fixed are made. So merging with develop is not in that category and does not need to be rerun.

@SamuelTrahanNOAA
Copy link
Collaborator Author

I've retested my nested global case inside the global-workflow and it has passed the failure point. (I had already tested it outside the workflow.) This triad of fixes still work for me. I look forward to seeing them in the develop branch.

@jkbk2004
Copy link
Collaborator

We are going to start working on this pr today. @FernandoAndrade-NOAA @BrianCurtis-NOAA FYI

@SamuelTrahanNOAA
Copy link
Collaborator Author

Jet hasn't finished. Did something go wrong over there?

I've had lots of little technical issues while running on Jet since the Rocky upgrade.

@FernandoAndrade-NOAA
Copy link
Collaborator

Jet hasn't finished. Did something go wrong over there?

I've had lots of little technical issues while running on Jet since the Rocky upgrade.

It was just a little slow yesterday, it looks like it passed, I'll push it up shortly.

@zach1221
Copy link
Collaborator

zach1221 commented Apr 12, 2024

We can proceed with the merging process. I'll follow up on the cubed-sphere

@SamuelTrahanNOAA
Copy link
Collaborator Author

The cubed-sphere PR has been merged. I updated the FV3 PR to point to the authoritative .gitmodules and cubed sphere.

You can proceed to merging the FV3 PR.

jkbk2004
jkbk2004 previously approved these changes Apr 13, 2024
@BrianCurtis-NOAA
Copy link
Collaborator

@SamuelTrahanNOAA FV3 merged.. hash: NOAA-EMC/fv3atm@37e7d48

@SamuelTrahanNOAA
Copy link
Collaborator Author

I have reverted .gitmodules and pointed FV3 to the head of the authoritative develop branch.

This PR is ready for final review and merge.

@jkbk2004 jkbk2004 requested a review from zach1221 April 15, 2024 12:32
@zach1221 zach1221 merged commit 281b32f into ufs-community:develop Apr 15, 2024
2 checks passed
zhanglikate added a commit to zhanglikate/ufs-weather-model that referenced this pull request May 3, 2024
commit f234a3e
Author: Ufuk Turunçoğlu <turuncu@ucar.edu>
Date:   Tue Apr 30 11:35:25 2024 -0600

    Fix for land component model (ufs-community#2191)

    * UFSWM - fix fully coupled land component configuration
      * NOAHMP - get fixed information from surface file

commit 04bbc15
Author: jiandewang <jiande.wang@noaa.gov>
Date:   Thu Apr 25 14:52:00 2024 -0400

    update MOM6 to its main repo. 20240401 commit (ufs-community#2241)

    * UFSWM -
      * MOM6 - update MOM6 to its main repo. 20240401 commit (NCAR-candidate-20240319)

commit b6c576d
Author: Daniel Sarmiento <42810219+dpsarmie@users.noreply.github.com>
Date:   Tue Apr 23 12:24:22 2024 -0400

    Merged global namelist (ufs-community#2173)

    * UFSWM - global_control.nml_IN has been added as the new regression test namelist template for all global regression tests. The namelist now uses pointers (i.e. @[abc]) for variables and default values have been added to the default_vars.sh script. A new section in default_vars.sh has been added (export_tiled) to account for tiled RTs that pulls the correct parameter files using the ATMRES variable.
    Regression tests have been modified to account for these changes. Tests that were not compatible with the GFSv17_p8 core have been disabled for now. They will be turned on as they are updated from GFSv16 to GFSv17.

commit 5d2ca19
Author: WenMeng-NOAA <48260754+WenMeng-NOAA@users.noreply.github.com>
Date:   Fri Apr 19 13:59:12 2024 -0400

    Update upp submodule (ufs-community#2213)

    * UFSWM - Update inline post
      * FV3 - Update upp submodule for inline post

commit 47c0099
Author: Brian Curtis <64433609+BrianCurtis-NOAA@users.noreply.github.com>
Date:   Wed Apr 17 15:59:48 2024 -0400

    Add bash linting to CI. Cleanup .sh scripts a bit. Address .sh bugs. Adds -v Verbose option. (ufs-community#2218)  Remove nowarn Intel compiler flag (ufs-community#2225)

    * UFSWM
    - Add bash linting to CI:
      - uses superlinter to check for consistent bash code writing
    - Cleans up .sh scripts to comply with superlinter
    - Cleans up .sh scripts to be more consistent, easier to read.
    - Add's -v verbose option if debugging outputs needed, otherwise simplifies rt.sh run echo's.
    - Addresses smaller bugs
      - quota/timeout search logic adjusted.
      - check for dirs existing (DISKNM, STMP, PTMP) before starting.
      - adjustments/cleanup to ecflow/rocoto sections
      - rt.sh will attempt to start ecflow, and only stop ecflow if it started from rt.sh.
      - fix for issue where run_dir will not delete properly.
    * FV3: Address compiler warnings
      * atmos_cubed_sphere: Address compiler warnings.

commit 4f32a4b
Author: Rick Grubin <152905742+rickgrubin-tomorrow@users.noreply.github.com>
Date:   Mon Apr 15 07:21:08 2024 -0600

    Document ATMW / ATMAERO / HAFS WM configurations (ufs-community#2160)

    * UFSWM
      * doc/Userguide
        * source
          * conf.py
          * Configurations.rst
          * FAQ.rst
          * InputsOutputs.rst
          * Introduction.rst

commit ac4445d
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Apr 15 08:59:42 2024 -0400

    Bump idna from 3.6 to 3.7 in /doc/UsersGuide (ufs-community#2234)

    *doc/UserGuide
       *requirements.txt - updates inda version from 3.6 to 3.7

commit 281b32f
Author: Samuel Trahan (NOAA contractor) <39415369+SamuelTrahanNOAA@users.noreply.github.com>
Date:   Mon Apr 15 08:38:01 2024 -0400

    bug fixes: kchunk3d ignored, hailwat uninitialized in dycore, tile_num wrong for nests (ufs-community#2201)

    * UFSWM - None.
      * FV3 - Write component will use kchunk3d. Model init sends the right tile number to CCPP.
        * atmos_cubed_sphere - Initialize the hailwat variable. Pass global_tile index to model.

commit 8a5f711
Author: Denise Worthen <denise.worthen@noaa.gov>
Date:   Thu Apr 11 13:32:26 2024 -0400

    Add PIO namelist control for CICE (ufs-community#2145)

    Update to CICE-Consortium/CICE aca8357. Adds implementation of namelist PIO options for CICE

commit 45c8b2a
Author: JONG KIM <jong.kim@noaa.gov>
Date:   Thu Apr 4 19:49:13 2024 -0400

    Hotfix/cubed sphere hash fix: HAILCAST diagnostic code (units issue) (ufs-community#2223)

    cubed_sphere hash update: f060e85 for a bug- fix in the HAILCAST diagnostic code (units issue)

commit 26e6db6
Author: Denise Worthen <denise.worthen@noaa.gov>
Date:   Wed Apr 3 19:57:08 2024 -0400

    Enable cpl_scalars export from ATM and NoahMP for use by CMEPS (ufs-community#2175)

      * CMEPS - allow additional dimension in cpl_scalars for CSG and regional ATM domains for use in mediator history files
      * CMEPS - fix mapping mask for lnd->atm
      * FV3 - add export of cpl_scalars
      * NOAHMP - add export of cpl_scalars

commit 1411b90
Author: Dusan Jovic <48258889+DusanJovic-NOAA@users.noreply.github.com>
Date:   Mon Apr 1 18:04:44 2024 -0400

    Update module_write_netcdf to avoid hangs in RRFS runs (ufs-community#2193)

    * UFSWM - Update module_write_netcdf to avoid hangs in RRFS runs
      * FV3 - Update module_write_netcdf to avoid hangs in RRFS runs

commit 87c27b9
Author: Matthew Masarik <86749872+MatthewMasarik-NOAA@users.noreply.github.com>
Date:   Fri Mar 29 15:23:42 2024 -0400

    WW3 feature:  Langmuir turbulence parameterization (ufs-community#2195)

      * WW3 - Langmuir turbulence parameterization

commit c54e986
Author: Samuel Trahan (NOAA contractor) <39415369+SamuelTrahanNOAA@users.noreply.github.com>
Date:   Wed Mar 27 16:11:03 2024 -0400

    regression test system bug fixes, eliminate MOM6 warnings (ufs-community#2197), add xr_cnvcld flag to FV3 (ufs-community#2185) (ufs-community#2202)

    * UFSWM - atparse.bash: correctly handle input that doesn't end with an end-of-line character. Fix some bugs in Rocoto support and clean up rt.sh.
      * FV3 - namelist flag xr_cnvcld to control if suspended grid-mean convective cloud condensate should be included in cloud fraction and optical depth calculation in radiation in the GFS suite
        * ccpp - physics-level changes to implement new namelist variable
      * MOM6 - update MOM6 code to eliminate all compiler warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change Priority: High Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI_Type_contiguous Encounters Invalid Count extremely verbose write statement in FV3
7 participants