Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant surface variables and update to send correct pointers to JEDI in fv3atm #747

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Aug 13, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Description

This PR only update the submodule pointers for fv3atm and ccpp-physiics for the following code changes:

  • Remove redundant surface variables to reduce memory footprint and to simplify the code (from @mzhangw)
  • Send/export correct pointers to JEDI (from @mark-a-potts)

No new input data required, no baseline changes.

Issue(s) addressed

Fixes NOAA-EMC/fv3atm#370.

Testing

Preliminary testing

Code change in #363 do not affect the ufs-weather-model runs. Code changes in #366 were tested on orion.intel against the existing baseline by @mzhangw, all tests passed.

Final regression testing

  • hera.intel
  • hera.gnu
  • orion.intel
  • [-] cheyenne.intel - skip due to PBS outage and rt.sh being killed at 1:00am after all tests except one passed
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • CI - cce9caf

Dependencies

@climbfuji
Copy link
Collaborator Author

Seems there was a major outage of the PBS queueing system on Cheyenne, I see lots of these errors:

...
13 min. TEST 003 control_c48 is running,  status: R jobid 9911071
Connection refused
qstat: cannot connect to server chadmin2.ib0.cheyenne.ucar.edu (errno=111)
Connection refused
qstat: cannot connect to server chadmin2.ib0.cheyenne.ucar.edu (errno=111)
Connection refused
qstat: cannot connect to server chadmin2.ib0.cheyenne.ucar.edu (errno=111)
14 min. TEST 003 control_c48 is finished,  status: - jobid 9911071

Will try auto-rt again for cheyenne.intel and cheyenne.gnu, hoping that the system has recovered.

@climbfuji
Copy link
Collaborator Author

Re: Cheyenne

UNSCHEDULED OUTAGE NOTIFICATION

The Cheyenne cluster's PBS server is currently experiencing problems. All PBS commands including qsub and qstat are unavailable.  CISL is aware of the problem and working to restore PBS as soon as possible.

START: Mon Aug 16 2021 11:05 AM MDT
END: Unknown

SERVICES AFFECTED
CISL Status

Will wait until outage is resolved.

@climbfuji
Copy link
Collaborator Author

NOTIFICATION UPDATE

CISL has determined the root cause of the problem that caused Cheyenne's PBS server to fail earlier today. The server has been restored and jobs that were queued before the server crash are now running. Submitting new jobs via qsub is not yet possible but is expected to be available later this afternoon. Users will be notified when PBS is fully restored.

Posted by Mick Coady
____________________

@climbfuji
Copy link
Collaborator Author

Cheyenne/Intel: same issue as previously, this time with test cpld_restart_bmark_v16:

...
143 min. TEST 011 cpld_restart_bmark_v16 is waiting in a queue,  status: Q jobid 9922793
144 min. TEST 011 cpld_restart_bmark_v16 is waiting in a queue,  status: Q jobid 9922793
145 min. TEST 011 cpld_restart_bmark_v16 is waiting in a queue,  status: Q jobid 9922793
146 min. TEST 011 cpld_restart_bmark_v16 is waiting in a queue,  status: Q jobid 9922793
run_test.sh terminated PID=67521
++ [[ pbs = \p\b\s ]]
++ echo 'run_util.sh: interrupt_job qsub_id = 9922793'
run_util.sh: interrupt_job qsub_id = 9922793

terminated at 1:00 am exactly.

@climbfuji
Copy link
Collaborator Author

fv3atm hash is correct (6bad820), ready to merge

@climbfuji climbfuji added the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label Aug 17, 2021
@BrianCurtis-NOAA
Copy link
Collaborator

I've doubled checked the hash's. They're good to go!

@climbfuji
Copy link
Collaborator Author

I've doubled checked the hash's. They're good to go!

Thanks, Brian. I'll merge myself ...

@climbfuji climbfuji merged commit e52ee1b into ufs-community:develop Aug 17, 2021
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
* Add github actions for python unittests.

* Include all python script in ush

* Skip defining QUILTING params when it is set to False

* Update py_workflow

* Update unittest for set_extrn_mdl_params.

* Updates from develop.

Co-authored-by: Daniel Shawul <dshawul@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove several redundant variables for surface composite physics
4 participants