Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend mid-forecast restart detection and resumption for coupled #2273

Closed
WalterKolczynski-NOAA opened this issue Jan 29, 2024 · 0 comments · Fixed by #2510
Closed

Extend mid-forecast restart detection and resumption for coupled #2273

WalterKolczynski-NOAA opened this issue Jan 29, 2024 · 0 comments · Fixed by #2510
Assignees
Labels
feature New feature or request

Comments

@WalterKolczynski-NOAA
Copy link
Contributor

What new functionality do you need?

Ability to detect when restart files for a coupled forecast already exist and resume the forecast from that point. This already exists for atmos, but needs to be extended to other components (and re-validated for the atmosphere).

What are the requirements for the new functionality?

Forecast job detects when there are existing restart files for the current cycle and resumes from that point.

Acceptance Criteria

A forecast that is aborted and then rerun detects restart files from the failed attempt and resumes from that point without rerunning already completed forecast hours.

Suggest a solution (optional)

Update forecast_det.sh to also detect restarts for the coupled components when they are turned on.

Can be done one of two ways: the existing FV3_det() can be renamed and updated to handle all the detection, or the skeleton *_det() can be updated and combined to use only the lowest value.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the feature New feature or request label Jan 29, 2024
WalterKolczynski-NOAA added a commit that referenced this issue Apr 23, 2024
This PR:
- enables restart capability of the forecast executable from a previous
failure.
- saves restarts during the run in a new `DATA` structure. The current
`DATA` structure:
![current
`DATA`](https://github.com/NOAA-EMC/global-workflow/assets/11394126/03383e2f-b7f8-43e0-8b78-c8f37a79ab84)
is being replaced by:
![Screenshot 2024-04-19 at 12 55
44 PM](https://github.com/NOAA-EMC/global-workflow/assets/11394126/8ab6e6df-bbdb-43cf-b0dc-8e066f537ee7)
where, the colored boxes are described as:
![Screenshot 2024-04-19 at 12 56
14 PM](https://github.com/NOAA-EMC/global-workflow/assets/11394126/30b20e50-6cc8-4433-988a-02d5b484e7b5)
- saves model output from `MOM6` and `CICE` within `MOM6_OUTPUT/` and
`CICE_OUTPUT/` sub-directories. This is done to keep the run directory
clean and easily identify component output.

This PR also:
- replaces link with copy. This enables the creation of a `DATA`
directory that is self-contained and can be used to diagnose issues
during failures. This is a NCO EE2 requirement and addresses part of an
outstanding bugzilla.

In the process of enabling the restart capability, functionality from
`forecast_postdet.sh` is moved to `forecast_predet.sh` that does not
depend on the outcome of `forecast_det.sh`. `forecast_det.sh` determines
where the initial conditions will come from; `COM` in the case of a
clean run or `DATArestart` in the case of a `RERUN`.
This should make it easier to separate **static** configuration and data
(fix files, etc) from **runtime** configuration (namelists, etc) and
data (initial conditions)

Additionally, this PR:
- adds 3 utility shell scripts in `test/`.  
  - 'nccmp.sh` - compare netCDF files using `nccmp`
  - `g2cmp.sh` - compare grib2 files using `wgrib2`
- `f90nmlcmp.sh` - compare Fortran90 nml files using `f90nml` (Requires
modulefiles to load `py-f90nml` module on RDHPCS platforms)
They are not used in the workflow, but are useful for users to compare
files.

Resolves #2273

Co-authored-by: Walter Kolczynski - NOAA <Walter.Kolczynski@noaa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants