-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install and test unified environment on supported HPCs #478
Comments
@climbfuji i can install this in the role.epic space on orion, jet, and cheyenne to start. hera may have to wait for an epic-owned installation because our nems account is at capacity. would you mind sharing the install recipe? |
Thanks for volunteering. I think we need to agree on the directory structure and naming conventions first, then create an install recipe that we can more or less copy and paste or automate with Jenkins. I wonder if this can wait until Thursday when we have our spack-stack meeting. Also, we need to update all site configs to have the compilers configured correctly. That can be a separate PR that goes in first. For example, we have this for Orion (https://github.com/NOAA-EMC/spack-stack/blob/develop/configs/sites/orion/packages.yaml):
but what we want is
and then our instructions/automation needs to take care of swapping between Intel-latest+GNU and Intel-18 for global workflow. Also, most sites do not have an Intel 18 configuration. We need to add this for sites were users run the global workflow. This is only a small number of sites, all others are ok with just Intel-whateveristherealready+GNU-whateveristherealready. |
thanks for this information. totally happy wait until thursday's meeting to discuss things before beginning the installs. since these site configs need updated and some sites need intel 18, there is plenty of prep. to do. are we using spack to install |
No, global workflow runs on a few hpcs that all have intel 18.
… On Feb 21, 2023, at 9:39 AM, Cameron Book ***@***.***> wrote:
@climbfuji <https://github.com/climbfuji>
@climbfuji <https://github.com/climbfuji> i can install this in the role.epic space on orion, jet, and cheyenne to start. hera may have to wait for an epic-owned installation because our nems account is at capacity.
would you mind sharing the install recipe?
Thanks for volunteering. I think we need to agree on the directory structure and naming conventions first, then create an install recipe that we can more or less copy and paste or automate with Jenkins. I wonder if this can wait until Thursday when we have our spack-stack meeting.
Also, we need to update all site configs to have the compilers configured correctly. That can be a separate PR that goes in first. For example, we have this for Orion (https://github.com/NOAA-EMC/spack-stack/blob/develop/configs/sites/orion/packages.yaml <https://github.com/NOAA-EMC/spack-stack/blob/develop/configs/sites/orion/packages.yaml>):
packages:
all:
compiler:: ***@***.***, ***@***.***, ***@***.***
providers:
mpi:: ***@***.***, ***@***.***, ***@***.***
but what we want is
packages:
all:
compiler:: ***@***.***, ***@***.***
#compiler:: ***@***.***
providers:
mpi:: ***@***.***, ***@***.***
#mpi:: ***@***.***
and then our instructions/automation needs to take care of swapping between Intel-latest+GNU and Intel-18 for global workflow. Also, most sites do not have an Intel 18 configuration. We need to add this for sites were users run the global workflow. This is only a small number of sites, all others are ok with just Intel-whateveristherealready+GNU-whateveristherealready.
thanks for this information. totally happy wait until thursday's meeting to discuss things before beginning the installs. since these site configs need updated and some sites need intel 18, there is plenty of prep. to do. are we using spack to install ***@***.***? on sites where the GW will be run that do not yet have it?
—
Reply to this email directly, view it on GitHub <#478 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RMDXEYPI3JHI55VTJLWYTVSLANCNFSM6AAAAAAVACBJHU>.
You are receiving this because you were mentioned.
|
10-4 |
Once the GSI is able to move off of intel 18 and to the same intel as the other GFS components we shouldn't need intel 18 anywhere anymore. Hoping this happens soon! |
@KateFriedman-NOAA @climbfuji speaking of: "Dear RDHPCS users, We plan to deprecate the software module intel/18.0.5.274 and impi/2018.0.4 from Hera . You are receiving this email because you have loaded the module from either your login profile or your batch jobs during the past year. Deprecating a software module means:
If you believe this module should remain supported (un-deprecated) please start a help ticket to request reversing this change within 5 work days. Otherwise, no response needed. https://rdhpcs-common-docs.rdhpcs.noaa.gov/wiki/index.php/Help_Requests. Thank you very much! RDHPCS User Support Group Similar email for Intel 18 and wgrib2/2.0.8 on Jet... |
I'm working on #333 on Hera (testing the unified environment with esmf@8.4.1 and mapl@2.35.2), and I've run into the following:
|
@AlexanderRichert-NOAA To get around the network access you should be able to transfer hera's Regarding nco, do you not have write access? I can try if I can delete the cached nco files. hdf+threadsafe: Not sure if it's a good idea to remove +threadsafe and hope for the best. Someone must have put it in for a reason. But, if you know for sure that cdo only ever gets used without OpenMP parallelism, then it may be ok. But let's make sure first that hdf5+threadsafe is really the problem. |
@AlexanderRichert-NOAA I removed the link and the source file behind it:
|
Well, rats, cdo does use OpenMP... and yet in hpc-stack, hdf5 is built without thread safety. @KateFriedman-NOAA do you know whether cdo could be run without OpenMP for global workflow? If so, then I could probably ease the thread safety requirement for cdo by adding "+openmp" to the |
I do not know unfortunately. I don't know much about |
Done, finally. See #503. |
Is your feature request related to a problem? Please describe.
We need to install and test the unified environment all supported HPCs. A good starting point is the list of preconfigured and configurable (generic) platforms in https://spack-stack.readthedocs.io/en/latest/Platforms.html.
Describe the solution you'd like
Left over from previous PR #454:
ncl
from global-workflow-env (also affects macos site config)See epic #503 for a list of final installations and successful tests. Consider this issue as completed if all the required boxes are ticket in the epic.
Preliminary testing done beforehand:
Additional context
n/a
The text was updated successfully, but these errors were encountered: