Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reorganize COM and refactor to use templates (#1421)
Reorganizes the entire COM directory into a more hierarchical structure and uses centrally-defined templates to define COM paths. ## Hierarchical Structure To organize output a lot better and not have 30000+ files in a single directory, all of the component COM directories are divided into a number of subdirectories for each type of output. Sample directory trees ### Cycled atmosphere only ``` gdas.20211222/00 ├── analysis │ └── atmos │ └── gsidiags │ ├── dir.0000 │ ├── dir.0001 │ ├── (Additional dir.* directories omitted for brevity) │ └── dir.0083 ├── model_data │ └── atmos │ ├── history │ ├── master │ └── restart ├── obs └── products └── atmos ├── cyclone │ └── tracks └── grib2 ├── 0p25 ├── 0p50 └── 1p00 101 directories ``` ``` enkfgdas.20211222/00 ├── earc00 ├── ensstat │ ├── analysis │ │ └── atmos │ │ └── gsidiags │ │ ├── dir.0000 │ │ ├── dir.0001 │ │ ├── (Additional dir.* omitted for brevity) │ │ └── dir.0039 │ └── model_data │ └── atmos │ └── history ├── mem001 │ ├── analysis │ │ └── atmos │ └── model_data │ └── atmos │ ├── history │ ├── master │ └── restart └── mem002 ├── analysis │ └── atmos └── model_data └── atmos ├── history ├── master └── restart 64 directories ``` ``` gfs.20211222/00 ├── analysis │ └── atmos ├── model_data │ └── atmos │ ├── history │ ├── master │ └── restart ├── obs └── products └── atmos ├── bufr ├── cyclone │ ├── genesis_vital │ └── tracks ├── gempak │ ├── 0p25 │ ├── 0p50 │ ├── 1p00 │ ├── 35km_atl │ ├── 35km_pac │ └── 40km ├── grib2 │ ├── 0p25 │ ├── 0p50 │ └── 1p00 └── wmo 26 directories ``` ``` enkfgfs.20211222/00 ├── earc00 ├── ensstat │ ├── analysis │ │ └── atmos │ │ └── gsidiags │ │ ├── dir.0000 │ │ ├── dir.0001 │ │ ├── (Additional dir.* directories removed for brevity) │ │ └── dir.0039 │ └── model_data │ └── atmos │ └── history ├── mem001 │ ├── analysis │ │ └── atmos │ └── model_data │ └── atmos │ ├── history │ ├── master │ └── restart └── mem002 ├── analysis │ └── atmos └── model_data └── atmos ├── history ├── master └── restart 64 directories ``` ### S2SWA coupled prototype (forecast-only): ``` gfs.20130401/00/ ├── model_data │ ├── atmos │ │ ├── history │ │ ├── input │ │ ├── master │ │ └── restart │ ├── chem │ │ └── history │ ├── ice │ │ ├── history │ │ ├── input │ │ └── restart │ ├── med │ │ └── restart │ ├── ocean │ │ ├── history │ │ ├── input │ │ └── restart │ └── wave │ ├── history │ ├── prep │ └── restart └── products ├── atmos │ ├── cyclone │ │ ├── genesis_vital │ │ └── tracks │ ├── gempak │ │ ├── 0p25 │ │ ├── 0p50 │ │ ├── 1p00 │ │ ├── 35km_atl │ │ ├── 35km_pac │ │ └── 40km │ ├── grib2 │ │ ├── 0p25 │ │ ├── 0p50 │ │ └── 1p00 │ └── wmo ├── ocean │ ├── 2D │ ├── 3D │ ├── grib │ │ ├── 0p25 │ │ └── 0p50 │ └── xsect └── wave ├── gempak ├── gridded ├── station └── wmo 51 directories ``` ### Trees with files gdas: https://gist.github.com/WalterKolczynski-NOAA/f1de04901e2703fd24d38146d2669789 gfs: https://gist.github.com/WalterKolczynski-NOAA/5d1b7c0a0f4b8cfff0be1ae54082316a enkfgdas: https://gist.github.com/WalterKolczynski-NOAA/860aaa804e3e70e191e7cae2ebb1055b enkfgfs: https://gist.github.com/WalterKolczynski-NOAA/130bfff4650ed8b07cf395079b65d318 S2SWA P8: https://gist.github.com/WalterKolczynski-NOAA/6ae90c6eafb573878f60682ce47179db ## Templating All of the COM paths have been replaced with new variables that are derived from a set of templates centrally defined in `config.com`. Variables in the templates are then substituted at runtime to generate the COM paths via the use of `envsubst`. To facilitate this, there is a new function, `generate_com` (see below), provided to automatically generate the COM paths. Where possible, COM paths are defined at the j-job level and made read-only. However, many of the EnKF scripts loop over the ensemble members, forcing the definitions to be made at the exscript level instead (and be mutable). The arguments to `generate_com()` are the list of COM variables to generate, optionally accompanied by a template to use using a colon to separate them. When no template is specified, the variable will be generated using the ${varname}_TMPL template. Two options are accepted, `-r` and `-x`, which will mark the variable as read-only and for export, respectively (the same as with the `declare` builtin). It is best practice to define any additional variables needed by the template on the same line to avoid adding them to the calling script’s scope. Here are some examples used in the code: Generate the path to the atmos analysis directory for the current cycle and `$RUN` (implicitly from the `$COM_ATMOS_ANALYSIS_TMPL` template) and mark as read-only and export: ``` YMD=${PDY} HH=${cyc} generate_com -rx COM_ATMOS_ANALYSIS ``` Generate the path to the atmos history directory for the previous cycle's gdas from the `$COM_ATMOS_HISTORY_TMPL` template and mark as read-only and export: ``` RUN=${GDUMP} YMD=${gPDY} HH=${gcyc} generate_com -rx \ COM_ATMOS_HISTORY_PREV:COM_ATMOS_HISTORY_TMPL ``` Generate the path to the first ensemble member's history directory of the current cycle and `$RUN` and mark for export: ``` MEMDIR=’mem001’ YMD=${PDY} HH=${cyc} generate_com -x COM_ATMOS_HISTORY ``` ## Additional information The staging of initial conditions in `setup_expy.py` has been updated to stage in the new locations. The source of the initial conditions can **either** be in the new hierarchical structure or in the old flat structure and the script will stage the files in the new structure. The destination paths are hard-coded here, so if any changes are made to the analysis, input, or restart templates, they will need to be mirrored in `setup_expy.py`. ### Stipulations All changes in this PR are subject to approval by several stakeholders, including NCO. Sample COM trees above are subject to revision based on feedback (for instance, file X isn't really an obs file). File name updates are not included in this PR. File names (primarily for coupled components) will be updated to comply with NCO standards in a future PR. AWIPS jobs are now almost working (they do not in current develop), but one last program is still ending with an error. Work on fit2obs is deferred, so that portion of the verify job does not work. WAFS scripts are all external and have not yet been updated. WAFS is expected to be packaged separately going forward, so will need to be updated like any other downstream package. Some scripts that are not part of our normal development workflow have not yet been updated. I may be able to knock a few more off this list, but some just aren’t available in development mode currently: - All UFSDA app jobs (to be handled separately) - With associated dev jobs (may still modify and test) - JGDAS_ATMOS_GLDAS - ~~JGLOBAL_WAVE_GEMPAK~~ - ~~JGLOBAL_WAVE_POST_BNDPNT~~ - ~~JGLOBAL_WAVE_POST_BNDPNTBLL~~ - ~~JGLOBAL_WAVE_PRDGEN_BULLS~~ - ~~JGLOBAL_WAVE_PRDGEN_GRIDDED~~ - ~~JGLOBAL_WAVE_PREP~~ - With no associated dev job - JGDAS_ATMOS_GEMPAK_META_NCDCJGFS_ATMOS_FBWIND - JGFS_ATMOS_FSU_GENESIS - JGFS_ATMOS_GEMPAK_META - JGFS_ATMOS_GEMPAK_NCDC_UPAPGIF - JGLOBAL_ATMOS_EMCSFC_SFC_PREP - JGLOBAL_ATMOS_POST_MANAGER - JGLOBAL_ATMOS_TROPCY_QC_RELOC + All downstream scripts for the above There are also a few scripts that are not available to the development workflow that I have already made a good-faith effort at updating: - JGDAS_ATMOS_GEMPAK - JGFS_ATMOS_PGRB2_SPEC_NPOESS ## Related Issues Closes #761 Fixes #978 Fixes #999 Fixes #1207 Partially addresses #198 Partially addresses #289 Partially addresses #293 Partially addresses #1299 Partially addresses #1326
- Loading branch information