-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for regression tests on NCEP RDHPC orion machine #468
Conversation
Conflicts: regtests/bin/matrix_divider_p.sh regtests/bin/run_test
Conflicts: .gitignore regtests/bin/matrix_ncep
tests went well on Orion except the ones related to NetCDF libraries on orion when using partitions (see: #451) |
If the ww3_tp2.16/./work_MPI_OMPH test is run in develop and in this branch from a fresh clone and only that test is run, then the output is identical (same thing for running 2 times from this branch). However, when you run multiple tests in this branch, we do not get reproducible results or reproduce the develop branch. Typically this type of error is because w3_new has an error. I'm continuing to investigate, but it seems like an unrelated error to this branch but for whatever reason we see it with srun (which we should be using) and not when using mpirun. |
Another test I have tried is to use the tp2.10 OMPH switch file for the tp2.16 test since it's immediately following it, which did not help either. However, the other thing is tp2.10 is a known not b4b tests (#321) and tp2.16 is very similar, just a different grid (arctic), so perhaps we were just lucky before that we were getting reproducibility there? @aliabdolali I've looked fairly extensively at this point. I'm not sure why srun hasn't been used all along and since every other test is the same except this one which is very similar to a known-not b4b test, I think our options at this point are
|
@JessicaMeixner-NOAA I think we should move forward, srun is more efficient especially with OMP options. So, given that, I will rerun the tests and will merge afterward. |
The tests ran successfully with known non-b4b tests _ ww3_tp2.16 ********************* non-identical cases **************************** mww3_test_03/./work_PR2_UNO_MPI_e (1 files differ) test ww3_tp2.16/./work_MPI_OMPH is added to the list of non-identical cases with a note in #321 |
Pull Request Summary
Adds support for NCEP's RDHPCS resource orion and updates hera to use "srun" as recommended.
Description
This PR allows for running of the regression tests on Orion. No comparison is done on orion and there are still a few errors due to a NetCDF issue on orion when using partitions (see: #451)
Issue(s) addressed
Check list
Is your feature branch up to date with the authoritative repository (NOAA/develop)? yes
Make sure you have checked the checklist for a developer submitting to develop and updating version number - no version number update
Please list appropriate labels code managers should add for this PR:
bug, documentation, enhancement, new feature, ..
Reviewers: @aliabdolali
Commit Message
Testing
matrixDiff.txt
matrixCompFull.txt
matrixCompSummary.txt
Please indicate the expected changes in the outputs (excluding the known list of non-identical tests). Just the typical non-b4b tests:
none