-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates the UFS hash and global-workflow for CICE fixes #2505
Updates the UFS hash and global-workflow for CICE fixes #2505
Conversation
@DeniseWorthen |
@aerorahul would you like me to run the C768 test for this? |
I think we will run the usual CI tests to ensure the model hash update does not break the baseline functionality. Does that work? |
@aerorahul that sounds like a great plan. As long as the existing CI tests pass, I see no reason to not merge this PR and address any additional issues separately. @DeniseWorthen is the best person to look at the CICE updates, but if she is unavailable, I'd be happy to double check what I can in reguards to that. Thank you @HenryWinterbottom-NOAA for making this update! |
@aerorahul The changes to the cice_in parsing looks correct. There is an issue for running on WCOSS2, since there seems to be some problem w/ using pio/pnetcdf so you currently can't use
I did a set of tests w/ the PIO feature and found pnetcdf2 was the best performance, and at high task counts, switching the rearrangers to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HenryWinterbottom-NOAA is there a reason you chose ufs-community/ufs-weather-model@8a5f711 ? The bug fix that issue #2490 needs is the next commit. My test C768 failed with the error that we are trying to fix and that the next commit should fix.
Actually, @JessicaMeixner-NOAA, could you replace |
@HenryWinterbottom-NOAA I see no reason not to move to the latest develop branch of ufs-waether-model. We should at minimum move to ufs-community/ufs-weather-model@281b32f which includes the bug-fxies that are required for fixing the C768 issues. I'll update to the latest, rebuild and test C768, but as @aerorahul said, if the low-resolution passes tests we should not delay this PR for the C768 issues. We could even mrege this PR as is, but it would not close issue 2490. |
@HenryWinterbottom-NOAA I used ufs-weather-model commit hash: 47c00995706380f9827a7e98bd242ffc6e4a7f01 I think it's safe to update to this commit hash and move forward with this PR. |
@JessicaMeixner-NOAA Thank you for following-up. I am testing against |
The commit hash I pointed you to is the latest commit to the develop branch hash (although a commit might be coming here in the next hour or so). It is from Wed Apr 17 ufs-community/ufs-weather-model@47c0099 |
OK, once the ocean/ice post apps complete, I will make the hash update and hopefully we can get this completed today. Thank you for helping out with this. |
8f9b53b
to
84e44ca
Compare
Yes. It appears to be a WCOSS2 thing. If someone else can verify it, that would be great. |
@HenryWinterbottom-NOAA @JessicaMeixner-NOAA has offered to help test on Hera w/ hdf5 to reproduce the issue seen on WCOSS2. |
@HenryWinterbottom-NOAA I have a C48 S2SW forecast test submitted on hera. I'll be interested to see what issues you're running into and will keep you up to date on my progress too. |
Thank you, @JessicaMeixner-NOAA . I am trying one more test to make sure I did botch something earlier working with @aerorahul. I will also keep you updated. Thank you for your help. |
@HenryWinterbottom-NOAA I'll be interested to see what tests you were running and failing with. So far my C48_S2SW test is running. It's about half way through the forecast. COM/EXP dirs are here: /scratch1/NCEPDEV/climate/Jessica.Meixner/updatemodelgw/test01/s2s01 |
@JessicaMeixner-NOAA I am cross checking that I have the correct UFSWM hash. The one I tested against earlier today was different. I see there were more recent pushes to I will update this thread as I learn more. |
Here's my clone if that's of any help: /scratch1/NCEPDEV/climate/Jessica.Meixner/updatemodelgw/test01/global-workflow I just checked out your latest branch from like 2 hours ago, it was up to date with develop. |
And this is working for you, correct? If so, that suggests that I botched something. |
@HenryWinterbottom-NOAA sounds like maybe the merge unintentionally changed the UFS hash on your end. Maybe a fresh clone will help? My test has finished the forecast job and some of the post is finishing up now, so it looks to be successful. |
@JessicaMeixner-NOAA OK, the confirms the branch isn't broken (a good thing). I am not sure what happened with earlier, but I am already doing as you are suggesting and so far, so good. Thank you for taking the time to help us (me) to debug. It is appreciated. |
@HenryWinterbottom-NOAA @aerorahul @DeniseWorthen I went ahead and ran another run with HDF5 as the option on hera to see if we could narrow this down to machine or HDF5, and got the following error /scratch1/NCEPDEV/climate/Jessica.Meixner/updatemodelgw/test01/s2s02hdf5/COMROOT/s2s02hdf5/logs/2021032312/gfsfcst.log.0
I believe this means that hdf5 has issues with linking in CICE and this is not necessarily a machine specific issue. What I do not know is if this is the way CICE uses HDF5, or just an HDF5 issue. If it's just the way CICE uses HDF5, is there anyway to change that so linking files is okay? If not, I think our solutions to being able to update the model are:
|
@JessicaMeixner-NOAA I can make the changes in my branch to that it can be tested WRT the symlink versus copy. Would this be worth testing? |
@HenryWinterbottom-NOAA it's my understanding that @aerorahul might be somewhere in the progress of that update, so perhaps coordinating that with him would be a good next step. I'd be curious to hear @DeniseWorthen thoughts on if we should create an issue for the CICE+HDF5+linking to see if there's anything that can be done on the CICE side? |
@JessicaMeixner-NOAA The post-processing jobs can look for model output in An alternative is to employ a Addressing these will take some thinking, tinkering, trial and error. It will also need a discussion with NCO on what will be acceptable to ops, given that we already have open tickets to replace links with copies. This is just a status update to keep everyone in the loop. Please let me know if there are any questions or anything needs clarification. |
Thanks for summarizing. This is correct. New libraries on WCOSS2 will allow us to continue linking, but it does not resolve NCO bugzilla requiring replacing links with copies. The second bullet does address it, but raises some other concerns (noted in this comment. |
@aerorahul Should we icebox this PR UFN? |
I've asked on ufs-community/ufs-weather-model#2232 when new libraries might be ready. |
I've tested a sandbox using linked output directories on Hera and it worked for both HDF5 and pnetcdf2. This is the HDF5 case:
I'm not clear on why this is failing on WCOSS2. It doesn't appear to me to have anything to do w/ the CICE PIO implementation. |
@DeniseWorthen
|
Closing. A new issue will be drafted and subsequent PR will be created. |
Description
This PR address issue #2490. The following is accomplished:
The
sorc/ufs_model.fd
submodule is updated to hashdevelop
as of 17 April 2024;The global-workflow
ush/parsing_nameslists_CICE.sh
is updated to include the new FORTRAN 90 namelist variables.Resolves Update ufs-weather-model hash in g-w #2490
Refs https://github.com/ufs-community/ufs-weather-model
Type of change
Change characteristics
How has this been tested?
The new
ufs_model.fd
hash has been tested using the global-workflow C48ATM
only andS2SW
CI configurations. The relevantrocotostat
information is as follows:ATM
S2SW
The CICE post-processing also completed accordingly:
Checklist