-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix time offset issue on ICS with GFS nemsio and netcdf files and add new archive file name on HPSS #457
Fix time offset issue on ICS with GFS nemsio and netcdf files and add new archive file name on HPSS #457
Conversation
@christinaholtNOAA @danielabdi-noaa, if you have any better idea, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
if cla.external_model == "FV3GFS" and cla.ics_or_lbcs == "LBCS": | ||
del file_templates['nemsio']['fcst'][1] | ||
del file_templates['netcdf']['fcst'][1] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need to add a similar entry to the "nemsio" format for this to work. Changing the format to nemsio
for the new test case you added I get this error:
Traceback (most recent call last):
File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 1012, in <module>
main(sys.argv[1:])
File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 814, in main
file_templates = get_file_templates(
File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 296, in get_file_templates
del file_templates['nemsio']['fcst'][1]
IndexError: list assignment index out of range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is happening because the gfs_file_names
anchor is shared for all sources, so when it tries hpss and doesn't succeed it deletes the sfc entry. Then when it tries aws, the entry is already gone. I think you can solve this by not using anchors in the yaml file (bad) or by making a deep copy of the dictionary before deleting the entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielabdi-noaa, nemsio is not available for the date (08/2022). netcdf and grib2 are only available. Do we need to add any other conditions for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielabdi-noaa, I've added if-statements to check the availability of netcdf
and nemsio
for the cycle date in jobs/JREGIONAL_GET_EXTRN_MDL_FILES
. If nemsio (or netcdf) is not available, the get_extrn_ics/lbcs
will fail with an error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo I still think we need a safeguard against multiple deletions of the sfc
file -- best would be to find another way that does not delete entries in the data_locations
dictionary. If hpss
does not have the file for some reason (maybe it is down), and we want to try aws
next, it will fail because the sfc
entry has been deleted by HPSS. The invalid nemsio date run i did (although wrong) shows what could happen if we can't find the file on hpss but could be available on aws. I added this before the del
if statement to make it try aws
successfully when it could not find it in hpss
file_templates = deepcopy(file_templates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielabdi-noaa, yes, I agree with you. Can you help me? I've sent an invitation to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo No big deal, i've pushed the change i think will avoid multiple deletions of the sfc
file. It is not an ideal solution so lets wait for @christinaholtNOAA .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo These changes look good to me. I have built your branch and submitted the new nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16
WE2E test on Hera. The new error message when setting a file format that has past the maximum CDATE is a great addition! I will go ahead and launch the Jenkins tests for this work now. Once @christinaholtNOAA has had the opportunity to review this work, I will approve.
@MichaelLueken , Thank you! |
Tests passed on Here. It looks good to me! |
@chan-hoo The |
@danielabdi-noaa, updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo With the exception of the MET_verification test on Cheyenne (which your changes wouldn't affect), all of the Jenkins tests have successfully passed.
I would like to note that the new nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16
test fails on Cheyenne. It appears as though the test data needs to be staged on that machine. Since the data isn't available, the test will fail. Before this test can be moved to either the fundamental or comprehensive test sets, the new ICs and LBCs will need to be brought over to Cheyenne.
Pending no additional feedback from @christinaholtNOAA, I will merge this PR before 5 pm EST. |
Actually, I just now saw this. Give me just a little while to take a look? |
@christinaholtNOAA No rush. I'll wait until you have approved the PR before merging. |
@danielabdi-noaa It looks like there is an issue with the metplus installation on Cheyenne:
I'll run a test using @natalie-perlin's Cheyenne HPC-stack location and see if this error persists. |
@MichaelLueken Thanks for looking into the issue. There are some more issues to address with EPIC modulefiles which I've summarized in this issue #458 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chan-hoo I'm sorry for the late response. These emails didn't immediately grab my attention.
I left a couple of comments below with some concerns and suggestions.
ush/retrieve_data.py
Outdated
|
||
# Remove sfc files from fcst in file_names of FV3GFS for LBCs | ||
# sfc files needed in fcst when time_offset is not zero. | ||
if cla.external_model == "FV3GFS" and cla.ics_or_lbcs == "LBCS": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be if "lbcs" and "fcst"? That way we don't tightly couple the tool to specifics of the models we use as inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christinaholtNOAA, I am afraid that it will cause another unexpected errors with other models. For example, 'GDAS', 'GSMGFS', 'RAP', 'HRRR', and 'NAM' have only one file name for 'fcst'. In these cases, we will have the same index error. For 'GEFS', we should not remove the second array from the file names. We just want to remove the 'sfc' file from the list. 'FV3GFS' only has this issue. What do you think of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is still pretty brittle. It assumes that there must be at least 2 and that the 2nd one is the surface file. I think that if we want to protect that assumption, we need a functional test.
I also think it's probably safer to do something like delete the entry if "sfc" is in the name of the file. That type of logic may get us the additional functionality we'd need to use offset GDAS netcdf/nemsio as ICS and expand the logic to "lbcs" and "fcst" instead of FV3GFS-specific logic. In general that says "if we're looking for lbcs from forecasts and a surface file is listed, don't try to get it".
As best I can tell, sfc files only show up for GDAS, FV3GFS, and GSMGFS.
I'd prefer to see something like this:
if cla.ics_or_lbcs == "LBCS":
for format in ['netcdf', 'nemsio']:
for i, tmpl in enumerate(file_templates.get('format', {}).get('fcst', [])):
if "sfc" in tmpl:
del file_templates[format]['fcst'][i]
It reduces the assumptions about order and the length of the list provided. It assures that we're only removing surface files, and not atm files accidentally. It also helps us if we want to expand our experiments run with offset GDAS files as ICs -- we can just add the sfc file to the GDAS fcst entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christinaholtNOAA, I tested your script, but sfc file was not removed from the list for LBCs: DEBUG: Looking for files like ['gfs.t{hh}z.atmf{fcst_hr:03d}.nemsio', 'gfs.t{hh}z.sfcf{fcst_hr:03d}.nemsio']
. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a typo there: 'format'
=> format
. I am testing it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies...I was just writing directly in the text box here and didn't do a full test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christinaholtNOAA , the tests were completed successfully!! The part has been replaced with your script. Thank you!
ush/machine/wcoss2.yaml
Outdated
@@ -34,8 +34,8 @@ task_run_fcst: | |||
FIXlut: /lfs/h2/emc/lam/noscrub/UFS_SRW_App/develop/fix/fix_lut | |||
data: | |||
GSMGFS: compath.py ${envir}/gsmgfs/${gsmgfs_ver}/gsmgfs.${PDY} | |||
FV3GFS: compath.py ${envir}/gfs/${gfs_ver}/gfs.${PDY} | |||
FV3GFS: compath.py ${envir}/gfs/${gfs_ver}/gfs.${PDY}/${cyc}/atmos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use ${hh} here instead of ${cyc}? This is a bit misleading because this ${cyc} is not necessarily aligned with the cyc
set for the RRFS config.
Then, there's no need to rename cyc in the ex-script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree. it has been replaced with 'hh'.
@@ -89,7 +89,7 @@ yyyymmdd=${yyyymmddhh:0:8} | |||
mm=${yyyymmddhh:4:2} | |||
dd=${yyyymmddhh:6:2} | |||
hh=${yyyymmddhh:8:2} | |||
|
|||
cyc=${hh} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially a problem. It resets an NCO required variable. This is the GFS cyc that we're looking for, which is a namespace collision with the existing cyc
for RRFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
@christinaholtNOAA, no problem at all. I understand. We are receiving so many emails from SRW :) I've replaced 'cyc' with 'hh'. However, I'd like to hear from you about the if-statement condition as I mentioned above. |
@chan-hoo I have gone ahead and relaunched the Jenkins test for Cheyenne with the GNU compiler. All tests successfully passed. Once @christinaholtNOAA has approved, I will go ahead and merge this PR. |
DESCRIPTION OF CHANGES:
file_names
ofnemsio/fcst
andnetcdf/fcst
inparm/data_locations.yml
.make_ics
task readsgfs.t12z.atmanl.nemsio(netcdf)
andgfs.t12z.sfcanl.nemsio(netcdf)
make_ics
task readsgfs.t06z.atmf006.nemsio(netcdf)
andgfs.t06z.sfcf006.nemsio(netcdf)
. The flaganl
is switched tofcst
.gfs.t{hh}z.sfcf{fcst_hr:03d}
is missing infcst
ofFV3GFS
.gfs.t{hh}z.sfcf{fcst_hr:03d}
should be removed formake_lbcs
to avoid any unnecessary work with the files. Therefore, the flagics_or_lbcs
is added toush/retrieve_data.py
.parm/data_locations.yml
._prod_
): com_gfs_prod_gfs.{yyyymmdd}_{hh}.gfs_pgrb2.tar_v16.2_
): com_gfs_v16.2_gfs.{yyyymmdd}_{hh}.gfs_pgrb2.tarType of change
TESTS CONDUCTED:
New WE2E test:
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_GFS_v16
WE2E tests:
community_ensemble_2mems_stoch
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km
hera.intel
orion.intel
cheyenne.intel
cheyenne.gnu
gaea.intel
jet.intel
wcoss2.intel
NOAA Cloud (indicate which platform)
Jenkins
fundamental test suite
comprehensive tests (specify which if a subset was used)
ISSUE:
Fixes issue mentioned in #456
CHECKLIST
CONTRIBUTORS