-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable matrix_ncep on orion #441
Comments
@jessica, the path to metis/parmetis on orion is to the ones compiled using hpc-stack |
They're build with hpc-intel/2019.5 but ufs-weather-model uses hpc-intel/2018.4 (see: https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/ufs_orion.intel#L16-L18) so I was currently switching to use that intel unless there's a reason we should deviate from that? |
@JessicaMeixner-NOAA I just removed the one with intel./2019 and compiled them with the same version of hpc stack module load hpc/1.1.0 the path did not change: |
Thanks @aliabdolali the PDLIB tests now seem to be passing. Current issues are:
|
FYI @ricampos |
I can get past the segfaults I was having by adding: |
Thanks, Jessica. I will leave a note for me to remember to add this line. |
Okay at this point I have a branch that runs everything on orion except for the netcdf output with the partitions, those tests still fail. |
@aliabdolali @ricampos should I go ahead and make a PR with the updates as of now or wait until we have a fix for the netcdf issues on orion? |
@JessicaMeixner-NOAA Thanks, please go ahead and make the PR. If needed, please make an issue associated with this problem. |
Hi Jessica, I found the problem on Orion. When ww3_ounf is compiled with netcdf/4.7.4 the program crashes during partition writing with the message "NetCDF: Name contains illegal characters" as you saw. It partially writes the file (without partitions) and then stop, but the problematic netcdf file is created. |
From now on I will always use module load netcdf/4.7.2 in my jobscripts. |
There was an issue when running the regtests on hera, I thought I had solved that problem, but I guess not. So no pull request yet for this branch. @ricampos while netcdf/4.7.2 solving the problem is great, that's not an hpc-stack module which is what we want to use. Let's make a new issue for just the netcdf problem problem on orion, using the hpc-stack modules instead. If needed we might need to create a simple test case that we can post on an issue on hpc-stack itself if need be. |
Understood. |
It works with netcdf/4.7.4 on hera I'll make a new issue -- let's continue this conversation there. |
ok |
We should be able to run matrix_ncep on orion. The first issue was to change mpirun to srun which is the command we should be using on hera too for slurm. Still debugging issues on orion that include:
-- I think the parmetis library needs to be rebuilt now that we're using hpc-stack modules on orion (@aliabdolali can you help with this?)
-- the oasis tests fail see question in issue #440
This work is being done on: https://github.com/JessicaMeixner-NOAA/WW3/tree/orion
When completed, the hope is to be able to use the hpc-stack modules on orion and run the WW3 regression tests on orion as well as hera.
The text was updated successfully, but these errors were encountered: