-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post fails for low-resolution experiments #1060
Comments
Limits the number of MPI tasks for post to the resolution of the forecast. UPP seems to fail if it is given more ranks than the resolution. Fixes NOAA-EMC#1060
Limits the number of MPI tasks for post to the resolution of the forecast. UPP seems to fail if it is given more ranks than the resolution. Fixes #1060
I am not convinced this is fixed. @WalterKolczynski-NOAA would it be possible to confirm this is working as you intended it to be by an XML generation test only? Otherwise, I think the post scripts have to be modified to call |
Yeah, I've seen it too. I have a fix I just need to submit it. In |
@WalterKolczynski-NOAA thanks, just wanted to make sure I wasn't going crazy/had a bad setup. |
I think the fix did work, but then subsequent updates forced a different solution. |
Taking this on. |
- Create mpmd_opt variable and set it to "--multi-prog". - Replace instances of "--multi-prog" in launcher commands with new mpmd_opt variable. Refs NOAA-EMC#1060
- Create new mpmd_opt variable (="--multi-prog") and replace instances of "--multi-prog" in launcher commands with mpmd_opt variable. - Update launcher commands that were missing "-n $npe" flags to now include npe # flag. Refs NOAA-EMC#1060
- Increase tasks from 20 to 40 for the lowest resolutions for the eobs jobs (C96 and C48). - The C96 eobs job was hitting the walltime with only 20 tasks. Refs NOAA-EMC#1060
- Some launcher commands were missing the "-n $npe" flag. - Add "-n" and task # variable in launcher commands if missing. Refs NOAA-EMC#1060
* Update multi-prog in HERA.env and ORION.env * Update launcher commands in HERA.env and ORION.env * Adjust C96 & C48 eobs resources in config.resources Refs #1060
@CoryMartin-NOAA This should now be fixed. Let me know if you encounter further issues with the low-res post jobs. |
Thank you @KateFriedman-NOAA will do!! |
- When the mpmd variable in the R&D env files was renamed to mpmd_opt the wave_mpmd setting in JGLOBAL_WAVE_INIT was not updated to match and thus broke the job when tested. - Update the wave_mpmd setting in JGLOBAL_WAVE_INIT to use the wave_mpmd setting defined in the env files instead of the old mpmd variable. Refs NOAA-EMC#1060
- Make matching changes to Jet and S4 env files to set mpmd_opt and use it in launcher commands in place of prior mpmd variable. Refs NOAA-EMC#1060
Expected behavior
Post should run for any resolution.
Current behavior
When running C96, post fails with a
too many MPI tasks, max is 96 stopping
message. Presumably UPP is limiting the number of ranks to the resolution (hold over from spectral?)Machines affected
Discovered on Orion, but presumably on every machine.
To Reproduce
gdaspost
and/orgfspost
tasks fail in the first full cycleoutpost_gfs_${CDATE}_postcntrl_gfs_anl.xml
file in${DATA}
to see the error messageThe text was updated successfully, but these errors were encountered: