-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set ACCNR in rt.sh, not detect_machine.sh. Add some comments explaini… #1016
Set ACCNR in rt.sh, not detect_machine.sh. Add some comments explaini… #1016
Conversation
…ng where the variable should not be set or used.
To make sure nobody misses it, I'm going to repeat this warning from the PR description: Warning: If a user or automated test system was setting ACCNR to an invalid value, then rt.sh will use that value after this PR, and will fail. Make certain your scripts are setting ACCNR properly, or not setting it at all, before testing this! |
Also, an answer-changing PR should be merged into this branch as soon as possible so the system can be fully tested. I only tested enough to ensure the ACCNR is passed correctly. |
I will pull this PR into #1009, which is the current one in the commit queue. |
@MinsukJi-NOAA Your branch breaks Jet support, so I'll need something else to test against.
|
I cannot see anything in your branch that would break Jet support, but somehow it fails to compile for me, while the community/develop succeeds. |
No, cancel that, the develop just failed too. A slightly older version of develop succeeds. I haven't tracked down what broke jet support yet... I think I need someone else to try, to see if this is something specific to my account. |
@MinsukJi-NOAA Never mind. Now everything is compiling properly. That may have been a temporary system issue. |
I tried to run the develop on jet, and got this error:
|
Are you setting ACCNR to anything? |
@MinsukJi-NOAA The develop branch does not run on Jet. You need to use my updates (that's why I did the PR). |
No. I guess this error will be fixed by your code changes in this PR? |
Yes. My changes correct the code so it will use h-nems by default on Jet, or the value of ACCNR if you set one. On develop, you had to set ACCNR or it would break. |
It's good to merge this PR into #1009 ? |
@MinsukJi-NOAA I am testing #1009 and #1016 together right now on Jet. I will let you know when I trust the merge. /lfs4/HFIP/hfv3gfs/s/pr-1009/tests Edit: I had typed the wrong rt_ directory path. |
Just hardcode ACCNR in rt.sh for testing if develop in principle compiles on jet ...
… On Jan 21, 2022, at 9:12 AM, Minsuk Ji ***@***.***> wrote:
It's good to merge this PR into #1009 <#1009> ?
—
Reply to this email directly, view it on GitHub <#1016 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RPLQA63I3NVXHK2GATUXGAVXANCNFSM5MOBYZ6Q>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.
|
@climbfuji That should still work of course, but with this PR everything will work as originally intended. |
At least I can confirm that there is no problem with finding netcdf, i.e. I don’t have the issue that you reported initially.
… On Jan 21, 2022, at 9:16 AM, Samuel Trahan (NOAA contractor) ***@***.***> wrote:
@climbfuji <https://github.com/climbfuji> That should still work of course, but with this PR everything will work as originally intended.
—
Reply to this email directly, view it on GitHub <#1016 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RMNSHD6AQ674W22ARTUXGBHHANCNFSM5MOBYZ6Q>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.
|
I confirm that there is no issue finding netcdf on jet. |
I have no trouble finding NetCDF now either. All of the build jobs are succeeding. |
@MinsukJi-NOAA The Jet tests are going well, so I started Hera intel and gnu. Right now, all three workflows are generating baselines (rt.sh -c) |
Hera finished generating baselines. It is validating against them now. /scratch1/NCEPDEV/stmp4/Samuel.Trahan/FV3_RT/REGRESSION_TEST_INTEL Jet is still generating baselines. |
Hera gnu and intel tests have passed. Jet has generated its baseline, and is verifying against that now. /lfs4/HFIP/hfv3gfs/s/rt/t/Samuel.Trahan/FV3_RT/REGRESSION_TEST_INTEL |
@MinsukJi-NOAA I can confidently say the #1016 and #1009 work together now. However, this PR has not been reviewed yet, and is not scheduled for merge until January 25. |
Thanks for testing @SamuelTrahanNOAA . Since the changes in this PR are small and help with Regression Testing, this PR can go in with #1009. |
@MinsukJi-NOAA Someone should review this before it is merged with anything else. |
The Jet tests passed. Though, it doesn't matter much at this point since @MinsukJi-NOAA is merging these changes in his PR #1009. That'll trigger a rerun of the regression test system on all platforms, including Jet. |
These changes were merged in #1009 so I'm closing this PR. |
PR Checklist
This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.
This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR
An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
are specified below.
Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.Instructions: All subsequent sections of text should be filled in as appropriate.
The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.
Description
The rt.sh ignores the ACCNR value set by the user in the environment because the big per-platform "if/elif/fi" hard-codes it to per-platform values (
ACCNR=GFS-DEV
) instead of allowing a user-specified default (ACCNR=${ACCNR:-GFS-DEV}
). It has to do this because the detect_machine.sh sets an incorrect default value (nems) so ACCNR is always set by the time it reaches the rt.sh per-platform if/elif/fi block. The fix is to remove the setting from detect_machine.sh and allow a user-specified default in rt.sh. The final default of ACCNR=${ACCNR:-nems} happens after that block to ensure the original behavior of a "nems" default when appropriate.Warning: If a user or automated test system was setting ACCNR to an invalid value, then rt.sh will use that value after this PR, and will fail. Make certain your scripts are setting ACCNR properly, or not setting it at all, before testing this!
Issue(s) addressed
The first attempt at fixing this (issue #979) is in PR #981, which caused a bug #1014 and didn't fix issues mentioned in #1015.
Resolves #1014 and resolves #1015
Testing
How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)
To avoid wasting CPU time, I have run minimal testing, sufficient to ensure the ACCNR is passed properly. To fully trust these changes, we need to run baseline generation and regression testing for all platforms. This PR should be combined with an answer-changing PR to avoid wasting CPU time. Also note that this may break the github-driven automated testing if it set ACCNR incorrectly.
Dependencies
None.