Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering GFS post_anl job #333

Closed
WenMeng-NOAA opened this issue Jun 12, 2021 · 7 comments · Fixed by #323
Closed

Triggering GFS post_anl job #333

WenMeng-NOAA opened this issue Jun 12, 2021 · 7 comments · Fixed by #323

Comments

@WenMeng-NOAA
Copy link
Collaborator

On 05/25/2021, Russ Treadon reported:

Lin and Weng,

The atmos_post_anl job is submitted by the atmos_post_manager when logf000.txt is present. This trigger is not always reliable.

The atmos_post_anl job processes output from job atmos_analysis_calc. This job creates loganl.txt. The presence of this file indicates that it is safe to run atmos_post_anl. Job atmos_analysis submits the forecast job. The forecast job creates logf000.txt. Depending on queue wait times, it is possible for the forecast job to create logf000.txt before atmos_analysis_calc creates loganl.txt.

Logic in exglobal_atmos_pmgr.sh should be updated to trigger the atmos_post_anl job when loganl.txt, not logf000.txt, is present. As currently written logf000.txt triggers both atmos_post_anl and atmos_post_f000

if [ -s ${COMIN}/${RUN}.${cycle}.logf${fhr}.txt -o  -s ${COMIN}/${RUN}.${cycle}.logf${fhr3}.txt ]
then
  if [ $fhr -eq 0 ]
  then

ecflow_client --event release_${RUN}_postanl

    ecflow_client --event release_postanl
  fi

ecflow_client --event release_${RUN}_post${fhr}

  ecflow_client --event release_post${fhr3}

This issue also exists in operations.

@WenMeng-NOAA
Copy link
Collaborator Author

On 05/25/2021, Gan Lin indicated:

I agree. Observed several such issues in my ecflow run.

pwd
/gpfs/dell3/ptmp/Lin.Gan/ecfr/com/output/prod/today/OLD
grep "model files not found" post
gdas_atmos_post_anl_06.o62177601:0.354 + echo 'model files not found, exitting'
gdas_atmos_post_anl_06.o62177601:model files not found, exitting
gdas_atmos_post_anl_12.o62098955:0.308 + echo 'model files not found, exitting'
gdas_atmos_post_anl_12.o62098955:model files not found, exitting
gdas_atmos_post_anl_12.o62482497:0.307 + echo 'model files not found, exitting'
gdas_atmos_post_anl_12.o62482497:model files not found, exitting
...

I put a temporary work around in the post_manager.ecf file to wait for 60 seconds. This workaround is not perfect.
Fixing exglobal_atmos_pmgr.sh will address the root cause.

@WenMeng-NOAA
Copy link
Collaborator Author

On 05/25/2021, Lin proposed a fix and successfully tested via ecflow on WCOSS so Wen can locally pick up for committing in the UPP branch "release/gfs_v16".

Hi Wen,

Please see /gpfs/dell2/emc/modeling/noscrub/Lin.Gan/ecflow-dell/ecf_gfsv16_low/scripts/exglobal_atmos_pmgr.sh.
I tested this fix with ecflow on mars; log file /gpfs/dell3/ptmp/Lin.Gan/ecfr/com/output/prod/today/gfs_atmos_post_manager_00.o62690449

@WenMeng-NOAA
Copy link
Collaborator Author

On 05/25/2021, Wen picked up Lin's fix and submitted the PR #323. Both Lin and Kate reviewed the changes. Lin tested it with ecflow.

@WenMeng-NOAA
Copy link
Collaborator Author

On 05/25/2021, Wen merged the PR #323 in the branch release/gfs_v16 for upcoming GFS V16 upgrade at NCO.

@WenMeng-NOAA WenMeng-NOAA linked a pull request Jun 12, 2021 that will close this issue
@WenMeng-NOAA
Copy link
Collaborator Author

The fix for this issue should be implemented in the upcoming gfs v16 upgrade.

@WenMeng-NOAA
Copy link
Collaborator Author

On 06/14/2021, for GFS V16 operation stability consideration, it was decided that removing this fix of post_anl job from GFS V16.1.2 implementation. The new was made with the UPP PR #330. We might implement this fix for GFS V17 development in the future.

@WenMeng-NOAA
Copy link
Collaborator Author

@HuiyaChuang-NOAA Please let me know if you have more comments on this issue. Thanks!

EricJames-NOAA pushed a commit to EricJames-NOAA/UPP that referenced this issue Dec 14, 2022
## DESCRIPTION OF CHANGES: 
* Remove the workflow variable EMC_GRID_NAME.  Henceforth, PREDEF_GRID_NAME is the only variable that can be used to set the name of the predefined grid to use.
* Make appropriate change of variable name (EMC_GRID_NAME --> PREDEF_GRID_NAME) in the WE2E test configuration files.
* Change anywhere the "conus" and "conus_c96" grids are specified to "EMC_CONUS_3km" and "EMC_CONUS_coarse", respectively.
* Rename WE2E test configuration files with names containing the strings "conus" and "conus_c96" by replacing these strings with "EMC_CONUS_3km" and "EMC_CONUS_coarse", respectively.
* Update the list of WE2E test names (tests/baselines_list.txt).
* Bug fixes not directly related to grids:
  * In config.nco.sh, remove settings of QUEUE_DEFAULT, QUEUE_HPSS, and QUEUE_FCST since these are now set automatically (due to another PR).
  * In the template file FV3LAM_wflow.xml, add the ensemble member name after RUN_FCST_TN in the dependency of the run_post metatask.

## TESTS CONDUCTED: 
Since this change only affects runs in NCO mode, the following NCO-mode WE2E tests were rerun on hera, all successfully:
```
nco_EMC_CONUS_3km                                       SUCCESS
nco_EMC_CONUS_coarse                                    SUCCESS
nco_EMC_CONUS_coarse__suite_FV3_GFS_2017_gfdlmp         SUCCESS
nco_RRFS_CONUS_25km_HRRRX_RAPX                          SUCCESS
nco_RRFS_CONUS_3km_FV3GFS_FV3GFS                        SUCCESS
nco_RRFS_CONUS_3km_HRRRX_RAPX                           SUCCESS
nco_ensemble                                            SUCCESS
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant