Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/negative peak frequency #741

Conversation

benoitp-cmc
Copy link
Contributor

@benoitp-cmc benoitp-cmc commented Jul 21, 2022

Pull Request Summary

Fix peak frequency calculation in extreme cases. Changes in w3iogomd, ww3_outp and ww3_ounp.

Description

In some cases where there is very little energy in the spectrum, peak frequency can occur at the min or max frequency. Currently this leads to faulty extrapolation, in some cases we see negative peak frequency. This PR ensures that peak frequency is kept within the min/max frequency range.

Edit: The FP range is inclusive of min frequency and exclusive of max frequency. With this PR, the calculation of peak frequency is consistent between gridded and point output except for the use of HSMIN in the point output #311.

While editing ww3_outp, I added an IS2 ifdef to some variable definition to prevent a crash.

Please also include the following information:

Issue(s) addressed

Commit Message

Fix peak frequency calculation in extreme cases.

Check list

Testing

  • How were these changes tested? Running some cases where the problem occurs and looking at output
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)? No
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc I don't see where an IS2 ifdef is added?

@benoitp-cmc
Copy link
Contributor Author

benoitp-cmc commented Jul 25, 2022

@JessicaMeixner-NOAA it turned out a bit strange in the rebase. I had added !/IS2 in 5bce5e6 then when doing interactive rebasing I swapped !/IS2 with the ifdef in the "main" commit: c42cb85 (in ww3_outp at line 1444).

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc I've run the full set of regression tests and we have a lot of changes:

matrixCompFull.txt
matrixCompSummary.txt

Is this expected?

Can you provide some additional information about how this PR was tested?

@benoitp-cmc
Copy link
Contributor Author

@JessicaMeixner-NOAA thank you.

It affects cases at both end of the frequency range.

  • With the patch, the highest frequency is now possible and it shows up in sheltered area with no swell like in Indonesia. This is a small difference with before which is why it did not raise red flags and was not discussed at length.
  • The difference at the low frequency end happens rarely but when it does it's very noticeable.

We've been running in operation with this patch for about a year now. Before implementing it I had a few runs where I could reproduce the low frequency bug, and wher I added extra output at points in the neighbourhood to see exactly what was going on and see if the result made sense. I would have to dig it back.

@JessicaMeixner-NOAA
Copy link
Collaborator

The issue here is that we're changing so many regression test answers and based on your explanation of the code change, I'd expect a rare change, not a lot of changes. So either a lot of the regression tests are triggering these rare cases or there's an unintentional change that needs to be diagnosed further.

@benoitp-cmc
Copy link
Contributor Author

benoitp-cmc commented Jul 27, 2022

  • out_grd: 224 out of 1290 are different (42 of those are expected I think, also some grib, netcdf, outf, .dp, .fp, and tab)

This does not surprise me much. The high frequency difference is not rare, though each time I expect it to affect only a few points and to be small. We probably would need to create a new regression test to catch the low frequency case.

  • out_pnt: 13 out of 706 are different.
  • mod_def: 6 out of 1525 are different
  • 20190830.030000.restart.glo_15m differ (binary)

I don't understand these. The changes are only in w3outg, ww3_outp and ww3_ounp. Does any of this feed back in something upstream? Nevermind, I see those are expected to be different.

@ukmo-ccbunney
Copy link
Collaborator

@benoitp-cmc I have run the regressions tests and I see lots of difference as Jessica did (in the out_grd files, etc) - as might be expected.

However, I notice that in some cases where fp was previously set to an UNDEF value (due to zero wave height at a calm initial condition), it is now being set to the highest frequency. Is this by design?

You can see this behaviour for example in the netCDF outputs for ww3_tp2.8/work_PR3_UQ.
In the develop branch, the peak frequency is all UNDEF at the first timestep, but is 0.71Hz in this branch.

@benoitp-cmc
Copy link
Contributor Author

benoitp-cmc commented Jul 29, 2022

@ukmo-ccbunney There is a check IF ( EC(JSEA) .GT. 0 ) so if there is absolutely no energy fp should stay as undef. That's what I see in my test on a cold start. That said, there would be a lot less UNDEF then before.

I guess if IKP0=NK, then fp wouldn't really be the maximum resolved frequency since we have a whole high frequency tail beyond it. In that context it would make sense to leave the UNDEF. That would also have a lot less impact on the regression tests.

I'll amend the code and I'll try my hand at running regression tests, at least ww3_tp2.8/work_PR3_UQ.

@benoitp-cmc
Copy link
Contributor Author

benoitp-cmc commented Aug 8, 2022

Thanks @ukmo-ccbunney that was a good catch.

By forbidding the highest frequency (as before), the bulk of what was UNDEF before is UNDEF again. There are differences though because the lowest frequency (IKP0=1) is now allowed.

Looking at ww3_tp2.8/work_PR3_UQ was very informative for this issue. Between the long swells coming in and the very low energy in the middle, the problem with the low frequency actually shows up.

  • At 7UTC at JSEA=7277 (48.7125 N 5.473 W) we had 0.0314 which is smaller than the lowest frequency (0.0373). With this PR, we now get 0.0383.
  • In general, if you plot fp, with this PR it looks more like what you would expect of a long swell coming in

Before PR

Before

After PR

After

Using a log scale. Sorry for the inconsistent appearance, the minimum of the before is lower (0.031) than after (0.038). The difference are all in the rows near -5.5E.

Rerunning regtests now.

@benoitp-cmc
Copy link
Contributor Author

benoitp-cmc commented Aug 12, 2022

Regression test differences

I've went through many of the regression tests. In the table below is a summary of my observations. The peak frequencies that were out of bound are marked in bold.

Test Freq min Freq max allowed Before min Before max After min After max Comments
ww3_tp2.5 PR1 0.0368 0.0425 -5.1 3.7 0.037 0.042 Great example of why this fix is necessary, was quite bad before
ww3_tp2.5 PR2_UNO -39
ww3_tp2.3 0.0465 0.1686 Small changes in PD where IKPO went from 2 to 1. FP not output by default, looks much better
ww3_tp2.8 0.0373 0.6834 0.023 0.068 0.037 0.068 Looks good
ww3_tic1.4 IC0IS2_1000 0.0800 0.1125 0.05 0.4 0.08 0.1 Looks better
ww3_tic1.4 IC1IS2_1000 0.0800 0.1125 0.1 0.1 0.08 0.1 Now allow lowest frequency
ww3_tic1.4 IC2IS2_IC2b 0.0500 0.1994 Didn't check
ww3_tic1.4 IC2IS2_IC2d 0.0500 0.1994 0.043 0.26 0.05 0.12 Better; before: weird jump in FP; after: smooth
ww3_tp1.6 PR1 0.1863 0.6752 0.7428 0.4448 "1D" with current, 2+m waves with most energy in the highest frequencies. Max from tab60
ww3_ts1 NL5 0.2000 0.2795 0.19 0.26 0.2 0.26 Before, inconsistent between tab49 and ounf
ww3_tp2.21 a 0.0350 0.9320 0.0375 0.035 Now allow lowest frequency
mww3_test_07 PR3_UQ 0.0373 0.6834 -0.17 0.08 0.037 0.08 Now allow lowest frequency. Better but still not pretty in the very low energy zone
ww3_tic2.3 0.0412 0.3520 0.0393 0.42 Now allow lowest frequency. Better
ww3_tp2.2 PR1 0.0400 0.0818 0.09 0.0804 Now forbids highest frequency for point output
mww3_test_01 out_grd differs. Shel includes FP/DP but outf/ounf do not
mww3_test_05 out_grd differs. Shel includes FP/DP but outf/ounf do not
ww3_tp2.4 out_grd differs. Shel includes FP/DP but outf/ounf do not
ww3_ufs1.3 out_grd, nc and grib2 differ; not unexpected.

Peak direction difference in ww3_tp2.3/work_PR1

I forced ww3_tp2.3/work_PR1 to output peak frequency to see what was going on.

  • Difference on left
  • After PR in middle
  • Before PR on right

Peak direction
Peak direction

Peak frequency
Peak frequency

The difference in peak directions are small. They stem from picking IKP0=1 instead of IKP0=2. The peak frequency difference are rather large, the PR improves things a good deal here.

Results of regression tests

matrixCompFull.txt
matrixCompSummary.txt
http://hpfx.collab.science.gc.ca/~bpo001/WW3/issue_304/matrixDiff.txt.gz

Conclusion

  • This PR does change a lot of regression tests, for the better.
  • This PR gets gridded and point output to have the same behaviour, except for point output using an HSMIN threshold.

The ww3_tp1.6/work_PR1 test is the only one that bugs me. At the point sampled in tab60, the highest frequency has the most energy and there are waves above 2 m.

  • We could chose to allow the highest frequency as FP. The ww3_tp1.6/work_PR1 test would look nicer
  • I would prefer if we kept it to not allow the highest frequency as FP. That way we avoid changing many cases with very low energy like ww3_tp2.8/work_PR3_UQ as discussed in comments on July 29.

@aliabdolali
Copy link
Contributor

@benoitp-cmc Thanks for doing this great work and for providing extensive analysis on the comments provided by @ukmo-ccbunney @JessicaMeixner-NOAA. I am going to check at my end and will come back to you.

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc I have taken over the review and testing of this PR. All the regression tests pass, but as there are many changes I'm going through carefully to ensure that they are only the intended changes and nothing unexpected. I suspect this might take me a few days, but I hope to get this merged next week. Thank you so much for your patience and updates in response to comments.

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc just a quick update to let you know I'm still going through this PR and checking all of the output for the differences that I'm seeing. Thanks again for your patience as this is taking longer than expected.

@benoitp-cmc
Copy link
Contributor Author

@JessicaMeixner-NOAA I feel your pain, it is long and tedious to go through the reg tests.

I'm happy as long as it's resolved before the next major release.

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc thank you so much for your updates and your patience through the review process.

I can confirm that I get the same output as you and these code changes are good to go.

More details:
The regression tests that change are:

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_01/./work_PR2_UNO                     (1 files differ)
mww3_test_01/./work_PR2_UQ                     (1 files differ)
mww3_test_01/./work_PR2_UNO_MPI                     (1 files differ)
mww3_test_01/./work_PR1                     (1 files differ)
mww3_test_01/./work_PR2_UQ_MPI                     (1 files differ)
mww3_test_01/./work_PR3_UQ                     (1 files differ)
mww3_test_01/./work_PR3_UNO                     (1 files differ)
mww3_test_01/./work_PR1_MPI                     (1 files differ)
mww3_test_01/./work_PR3_UQ_MPI                     (1 files differ)
mww3_test_01/./work_PR3_UNO_MPI                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (15 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (21 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (12 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (12 files differ)
mww3_test_05/./work_ST6_PR1_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR1_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR1_OMP                     (2 files differ)
mww3_test_05/./work_ST1_PR1_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UQ_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR1_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UQ_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UQ_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR1_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UQ_MPI                     (2 files differ)
mww3_test_07/./work_PR3_UQ                     (12 files differ)
ww3_ta1/./work_UPD0F_U                     (0 files differ)
ww3_tic1.4/./work_IC0IS2_1000                     (3 files differ)
ww3_tic1.4/./work_IC1IS2_1000                     (2 files differ)
ww3_tic1.4/./work_IC2IS2_IC2d                     (3 files differ)
ww3_tic1.4/./work_IC2IS2_IC2b                     (3 files differ)
ww3_tic2.3/./work_IC2IS2creep                     (1 files differ)
ww3_tic2.3/./work_IC2IS2dissip                     (1 files differ)
ww3_tic2.3/./work_IC2IS2scat                     (1 files differ)
ww3_tp1.6/./work_PR1                     (1 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)
ww3_tp2.17/./work_mb                     (1 files differ)
ww3_tp2.17/./work_b                     (1 files differ)
ww3_tp2.17/./work_mc                     (1 files differ)
ww3_tp2.17/./work_a                     (1 files differ)
ww3_tp2.17/./work_c                     (1 files differ)
ww3_tp2.17/./work_ma                     (1 files differ)
ww3_tp2.17/./work_mc1                     (1 files differ)
ww3_tp2.17/./work_ma1                     (1 files differ)
ww3_tp2.2/./work_PR2_UNO                     (1 files differ)
ww3_tp2.2/./work_PR2_UQ                     (1 files differ)
ww3_tp2.2/./work_PR2_UNO_MPI                     (1 files differ)
ww3_tp2.2/./work_PR2_UQ_MPI                     (1 files differ)
ww3_tp2.2/./work_PR3_UQ                     (1 files differ)
ww3_tp2.2/./work_PR3_UNO                     (1 files differ)
ww3_tp2.2/./work_PR3_UQ_MPI                     (1 files differ)
ww3_tp2.2/./work_PR3_UNO_MPI                     (1 files differ)
ww3_tp2.21/./work_a                     (2 files differ)
ww3_tp2.21/./work_ma                     (2 files differ)
ww3_tp2.3/./work_PR3_UNO                     (5 files differ)
ww3_tp2.3/./work_PR1_MPI                     (5 files differ)
ww3_tp2.3/./work_PR3_UQ_MPI                     (5 files differ)
ww3_tp2.3/./work_PR3_UNO_MPI                     (5 files differ)
ww3_tp2.4/./work_PR2_UNO                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ                     (1 files differ)
ww3_tp2.4/./work_PR1_curv                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR1_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UNO_MPI                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR1                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ_MPI                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO_curv                     (1 files differ)
ww3_tp2.4/./work_PR1_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UNO_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UNO_curv                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ_MPI                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ_curv                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ_curv                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO_MPI                     (1 files differ)
ww3_tp2.5/./work_PR2_UNO                     (14 files differ)
ww3_tp2.5/./work_PR2_UQ                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UNO_MPI                     (38 files differ)
ww3_tp2.5/./work_REF_PR1_MPI                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UNO                     (38 files differ)
ww3_tp2.5/./work_REF_PR1                     (38 files differ)
ww3_tp2.5/./work_PR2_UNO_MPI                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UQ                     (38 files differ)
ww3_tp2.5/./work_PR1                     (14 files differ)
ww3_tp2.5/./work_PR2_UQ_MPI                     (14 files differ)
ww3_tp2.5/./work_PR3_UQ                     (14 files differ)
ww3_tp2.5/./work_PR3_UNO                     (14 files differ)
ww3_tp2.5/./work_PR1_MPI                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UNO                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UQ                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UNO_MPI                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UQ_MPI                     (38 files differ)
ww3_tp2.5/./work_PR3_UQ_MPI                     (14 files differ)
ww3_tp2.5/./work_PR3_UNO_MPI                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UQ_MPI                     (38 files differ)
ww3_tp2.8/./work_PR3_UQ                     (7 files differ)
ww3_ts1/./work_NL5                     (3 files differ)
ww3_ufs1.3/./work_a                     (1 files differ)

The matrix output (diff was too large to include):
matrixCompFull.txt
matrixCompSummary.txt

Looking through the diffs, they are all related to peak frequency, and as @benoitp-cmc has indicated earlier in the thread these are 'good' differences.

To make sure all of the regression tests were as expected, I ended up running both this branch and the develop branch with additional output and also seperated the netcdf files by variable to help confirm that the changes were what was expected. That branch with extra output for the develop branch can be found here: https://github.com/JessicaMeixner-NOAA/WW3/tree/extraoutput for anyone who is interested, and the output of the comparison can then be seen here:

matrixCompFull.txt
matrixCompSummary.txt

I'm running one additional test so I can include here whether this will change answers for another system. I'm anticipating to merge this PR later today though. Thanks again @benoitp-cmc

@benoitp-cmc
Copy link
Contributor Author

@JessicaMeixner-NOAA thanks so much for taking the time to go through this. Great idea to split the output.

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc my one additional test was our coupled model regression tests here at EMC. The debug test there is failing on:

195: forrtl: severe (408): fort: (3): Subscript #1 of the array EBD has value 0 which is less than the lower bound of 1
195:
195: Image              PC                Routine            Line        Source
195: fv3.exe            0000000012091A5F  Unknown               Unknown  Unknown
195: fv3.exe            00000000034866D5  w3iogomd_mp_w3out        2347  w3iogomd.F90
195: libiomp5.so        00002B93654C3BB3  __kmp_invoke_micr     Unknown  Unknown
195: libiomp5.so        00002B936543FFAC  __kmp_fork_call       Unknown  Unknown
195: libiomp5.so        00002B9365401CB5  __kmpc_fork_call      Unknown  Unknown
195: fv3.exe            000000000342AE0A  w3iogomd_mp_w3out        2320  w3iogomd.F90
195: fv3.exe            0000000003815DDB  w3wavemd_mp_w3wav        3059  w3wavemd.F90

I'm going to see if I can recreate this issue in one of the standalone WW3 regression tests which will be easier to debug. I'm hoping this is a pretty quick fix, but if it's okay with you I'm going to try to figure out this issue before merging. The other option would be to merge this and then immediately put in a fix, but I think it might be best to just to try to fix this now? Thoughts?

@benoitp-cmc
Copy link
Contributor Author

benoitp-cmc commented Sep 20, 2022

@JessicaMeixner-NOAA agreed, this should be fixed before merging.

Strange. I don't see anywhere where the first index of EBD could be 0. We do allow IKP0=1 now (before was 2), but as far as I can tell all the EBD are protected.

@JessicaMeixner-NOAA
Copy link
Collaborator

I started the set of matrix regtests with debug compile on and am also working on just looking at our coupled test to see if I can't figure it out from there. I'll keep you up to date with what I find.

@JessicaMeixner-NOAA
Copy link
Collaborator

Okay so if I update run with debug on, here are some of the regtests that give me an error:
Running now options: run_test -b slurm -o all -S -T -i i_lowres_multi -w work_lowres -m grdset_a -f -p srun -n 24 ../model mww3_test_08
Running now options: run_test -b slurm -o all -S -T -s ST1_PR1_MPI -w work_ST1_PR1_MPI -f -p srun -n 24 ../model ww3_ts3

I'm thinking it's when we call EBD(IK-1, *) such as:
https://github.com/eccc-waves/WW3/blob/Bugfix/negative_peak_frequency/model/src/w3iogomd.F90#L2161-L2190
which I'm thinking just needs an update like you have here:
image

Here's so far the line numbers where I get errors from:
2163, 2147, 2155, 1507, 1498 in w3iogomd.F90 (note this is with the develop branch merged into your branch, which I don't think changes the line numbers).

@benoitp-cmc Let me know what your thoughts are, I can help with code updates, additional tests, etc.

@benoitp-cmc
Copy link
Contributor Author

I see, I had added the .AND. IK .GT. 1 check to avoid this kind of error. But it does not help (systematically) since Fortran does not short-circuit conditionals. Sorry about that. I am really glad you caught it.

That explains 2163, 2147, 2155. I have no idea why 1507, 1498 would give errors, that would be unrelated.

I did not want to touch FP1. Since the allowable range of IKP1 is 2 to NK-1, there is no risk of the calculation degenerating. (We could simplify the calculation by using what I did for EL and EH for FP0 and forget the ILOW. We should also add a check on IKP1 .NE. 0).

I am not sure how to do this the most cleanly.

  • The minimal change would be to first check IF (IKP1(JSEA).EQ.0 .AND. IK .GT. 1) before checking the rest with a separate conditional.
  • We could repeat the do loop on frequency, once from NK-1 to 1 for IKP0 and once from NK-1 to 2 for FP1. That would allow us to remove the .AND. IK .GT. 1 check, the IKP1 .EQ. 0 check and break out of the loop. But it would require doing CALL INIT_GET_ISEA twice.
  • Other?

@JessicaMeixner-NOAA
Copy link
Collaborator

I'd say let's try fixing the ones we know are related to this and then see if the other errors go away (they could have been from not the first MPI tasks for example during the error out or from completely unrelated errors as you mentioned).

Admittedly, I was not thinking about the fact that these could be in other loops for different variables when I was just searching for this issue. My first thought is to just go with how you did other parts of the code you updated here, but if you think the other way is cleaner, then I have no objections. Calls to INIT_GET_ISEA aren't the best, but depending on how many if-statements you add it could be about equal.

@benoitp-cmc
Copy link
Contributor Author

As far as I can tell, the FP1 calculated in w3iogomd is only used to calculate THP1. And neither of these are used anywhere else. If someone wants peak frequency of wind wave they can get it for partition 0.

@JessicaMeixner-NOAA I would suggest we remove FP1 and THP1 entirely.

We would not touch w3src2md. In it a variable called FP1 is used but it's local and its calculation looks quite different.

@JessicaMeixner-NOAA
Copy link
Collaborator

JessicaMeixner-NOAA commented Sep 22, 2022

@benoitp-cmc I've been looking in the code and I agree that the only time I see FP1 used, it's defined locally in w3src2, it does not look like FP1 and THP1 are used anywhere, in fact I see:

!    Old parameters not yet in new structure ...
!
!      FP1       R.A.  Public   Wind sea peak frequency. (parked in 2)
!      THP1      R.A.  Public   Wind sea peak direction. (parked in 2)

in W3ADATMD.

Removing these variables should be in its own PR so we can confirm there's no unintended effects. Would you have time to do that or do you need me to make that PR? In parallel, I will delete those lines and run the debug tests to confirm that solves the issue I encountered with this PR.

I made an issue with this update: #803

@benoitp-cmc
Copy link
Contributor Author

Thanks. Make sense to have the removal in a separate pull request. I can take a stab at it tomorrow.

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc sounds good. Let me know what I can do to help. In the meantime, I put in the issue a potential problem with removal but I think we can work around it.

@benoitp-cmc
Copy link
Contributor Author

@JessicaMeixner-NOAA when will you merge this? There might be some merge conflicts with the removal of FP1 and THP1.

@JessicaMeixner-NOAA
Copy link
Collaborator

I don't want to merge this until we can fix the debug issues. So we'd either need to do a temporary fix here and then remove FP1/THP1 or remove FP1/THP1 first and then merge all of that here. If there are merge isssues, I'd just need to ask you to merge the develop branch into your branch with the PR here instead of just merging things locally when I'm running tests which is what I've been doing.

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc as expected, this branch now has conflicts that needs to be resolved after PR #807. Can you merge the develop branch into this branch and then I'll start testing this again, including the debug testing?

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @benoitp-cmc I'll start running tests!

@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc just wanted to let you know that the debug test that previously was failing is now passing. Still re-running the full set of tests which might run into our machine having a 2 day maintenance, but things are looking good so far!

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @benoitp-cmc , I wanted to give a short update. The machine maintenance finished yesterday, though @JessicaMeixner-NOAA is out of office today. She will be back Monday and we will work to finalize this PR then. Thank you for your patience.

@JessicaMeixner-NOAA
Copy link
Collaborator

Same tests are different:

mww3_test_01/./work_PR2_UNO                     (1 files differ)
mww3_test_01/./work_PR2_UQ                     (1 files differ)
mww3_test_01/./work_PR2_UNO_MPI                     (1 files differ)
mww3_test_01/./work_PR1                     (1 files differ)
mww3_test_01/./work_PR2_UQ_MPI                     (1 files differ)
mww3_test_01/./work_PR3_UQ                     (1 files differ)
mww3_test_01/./work_PR3_UNO                     (1 files differ)
mww3_test_01/./work_PR1_MPI                     (1 files differ)
mww3_test_01/./work_PR3_UQ_MPI                     (1 files differ)
mww3_test_01/./work_PR3_UNO_MPI                     (1 files differ)
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (17 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (12 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (17 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (16 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)
mww3_test_05/./work_ST6_PR1_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR1_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR1_OMP                     (2 files differ)
mww3_test_05/./work_ST1_PR1_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UQ_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR1_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UQ_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UNO_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UQ_MPI                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UNO_MPI                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST6_PR2_UNO_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR1_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR2_UQ_MPI_OMPH                     (2 files differ)
mww3_test_05/./work_ST1_PR3_UQ_OMP                     (2 files differ)
mww3_test_05/./work_ST6_PR3_UQ_MPI                     (2 files differ)
mww3_test_07/./work_PR3_UQ                     (12 files differ)
ww3_ta1/./work_UPD0F_U                     (0 files differ)
ww3_tic1.4/./work_IC0IS2_1000                     (3 files differ)
ww3_tic1.4/./work_IC1IS2_1000                     (2 files differ)
ww3_tic1.4/./work_IC2IS2_IC2d                     (3 files differ)
ww3_tic1.4/./work_IC2IS2_IC2b                     (3 files differ)
ww3_tic2.3/./work_IC2IS2creep                     (1 files differ)
ww3_tic2.3/./work_IC2IS2dissip                     (1 files differ)
ww3_tic2.3/./work_IC2IS2scat                     (1 files differ)
ww3_tp1.6/./work_PR1                     (1 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.14/./work_OASOCM                     (1 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)
ww3_tp2.17/./work_mb                     (1 files differ)
ww3_tp2.17/./work_b                     (1 files differ)
ww3_tp2.17/./work_mc                     (1 files differ)
ww3_tp2.17/./work_a                     (1 files differ)
ww3_tp2.17/./work_c                     (1 files differ)
ww3_tp2.17/./work_ma                     (1 files differ)
ww3_tp2.17/./work_mc1                     (1 files differ)
ww3_tp2.17/./work_ma1                     (1 files differ)
ww3_tp2.2/./work_PR2_UNO                     (1 files differ)
ww3_tp2.2/./work_PR2_UQ                     (1 files differ)
ww3_tp2.2/./work_PR2_UNO_MPI                     (1 files differ)
ww3_tp2.2/./work_PR2_UQ_MPI                     (1 files differ)
ww3_tp2.2/./work_PR3_UQ                     (1 files differ)
ww3_tp2.2/./work_PR3_UNO                     (1 files differ)
ww3_tp2.2/./work_PR3_UQ_MPI                     (1 files differ)
ww3_tp2.2/./work_PR3_UNO_MPI                     (1 files differ)
ww3_tp2.21/./work_a                     (2 files differ)
ww3_tp2.21/./work_ma                     (2 files differ)
ww3_tp2.3/./work_PR2_UNO                     (5 files differ)
ww3_tp2.3/./work_PR2_UQ                     (5 files differ)
ww3_tp2.3/./work_PR2_UNO_MPI                     (5 files differ)
ww3_tp2.3/./work_PR1                     (5 files differ)
ww3_tp2.3/./work_PR2_UQ_MPI                     (5 files differ)
ww3_tp2.3/./work_PR3_UQ                     (5 files differ)
ww3_tp2.3/./work_PR3_UNO                     (5 files differ)
ww3_tp2.3/./work_PR1_MPI                     (5 files differ)
ww3_tp2.3/./work_PR3_UQ_MPI                     (5 files differ)
ww3_tp2.3/./work_PR3_UNO_MPI                     (5 files differ)
ww3_tp2.4/./work_PR2_UNO                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ                     (1 files differ)
ww3_tp2.4/./work_PR1_curv                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR1_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UNO_MPI                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR1                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ_MPI                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO_curv                     (1 files differ)
ww3_tp2.4/./work_PR1_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UNO_curv_MPI                     (1 files differ)
ww3_tp2.4/./work_PR2_UNO_curv                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ_MPI                     (1 files differ)
ww3_tp2.4/./work_PR3_UQ_curv                     (1 files differ)
ww3_tp2.4/./work_PR2_UQ_curv                     (1 files differ)
ww3_tp2.4/./work_PR3_UNO_MPI                     (1 files differ)
ww3_tp2.5/./work_PR2_UNO                     (14 files differ)
ww3_tp2.5/./work_PR2_UQ                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UNO_MPI                     (38 files differ)
ww3_tp2.5/./work_REF_PR1_MPI                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UNO                     (38 files differ)
ww3_tp2.5/./work_REF_PR1                     (38 files differ)
ww3_tp2.5/./work_PR2_UNO_MPI                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UQ                     (38 files differ)
ww3_tp2.5/./work_PR1                     (14 files differ)
ww3_tp2.5/./work_PR2_UQ_MPI                     (14 files differ)
ww3_tp2.5/./work_PR3_UQ                     (14 files differ)
ww3_tp2.5/./work_PR3_UNO                     (14 files differ)
ww3_tp2.5/./work_PR1_MPI                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UNO                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UQ                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UNO_MPI                     (38 files differ)
ww3_tp2.5/./work_REF_PR3_UQ_MPI                     (38 files differ)
ww3_tp2.5/./work_PR3_UQ_MPI                     (14 files differ)
ww3_tp2.5/./work_PR3_UNO_MPI                     (14 files differ)
ww3_tp2.5/./work_REF_PR2_UQ_MPI                     (38 files differ)
ww3_tp2.8/./work_PR3_UQ                     (7 files differ)
ww3_ts1/./work_NL5                     (3 files differ)
ww3_ufs1.3/./work_a                     (1 files differ)

caused by the changes in FP. Full list of diffs (minus diff file which was too big):
matrixCompFull.txt
matrixCompSummary.txt

For ufs, this changes answers, see log:
RegressionTests_orion.intel.log.orig.txt
Because of the fp changes, but it does pass if you create a new baseline:
RegressionTests_orion.intel.log.txt

@JessicaMeixner-NOAA JessicaMeixner-NOAA merged commit 900d21d into NOAA-EMC:develop Oct 3, 2022
@JessicaMeixner-NOAA
Copy link
Collaborator

@benoitp-cmc thank you so much for your great work and your patience with getting this merged!

@benoitp-cmc
Copy link
Contributor Author

Thank you @JessicaMeixner-NOAA and everyone for reviewing and steering this in the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative peak frequency
5 participants