Feature #2887 categorical weights PR 1 of 2 #2967

JohnHalleyGotway · 2024-09-06T22:17:53Z

Expected Differences

This pull request changes the contingency table class hierarchy in MET from storing integer counts to storing sums of double-precision weights. I'm breaking this work across multiple PR's to make it easier to track the source of the differences.

The next PR for MET#2887 will actually apply the existing grid_weight_flag to the computation of contingency table-based outputs (CTS, NBRCTS, MCTS, and PSTD).

Do these changes introduce new tools, command line arguments, or configuration file options? [No]

If yes, please describe:
Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

If yes, please describe:

Pull Request Testing

Describe testing already performed for these changes:

Manually ran the regressions test via GHA. Along the way I discovered and fixed bugs in the existing CTS outputs, as described in MET#2958.
Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

Review the code changes.
Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]
None needed for this PR. The next PR will update the description of how the grid_weight_flag is used.
Do these changes include sufficient testing updates? [Yes]
No additional tests needed.
Will this PR result in changes to the MET test suite? [Yes]

If yes, describe the new output and/or changes to the existing output:

The hope was that this PR would result in zero differences. However, this GHA run flags differences in the ORSS, ORSS_NCL, and ORSS_NCU columns of the CTS line type in 5 output files:

diff/perc_thresh/grid_stat_PERC_THRESH_FBIAS_240000L_20120410_000000V_cts_OUTPUT.txt
diff/perc_thresh/grid_stat_PERC_THRESH_FBIAS_240000L_20120410_000000V_OUTPUT.stat
diff/grid_stat/grid_stat_GTG_latlon_060000L_20130827_180000V_OUTPUT.stat
diff/grid_stat/grid_stat_GTG_latlon_060000L_20130827_180000V_cts_OUTPUT.txt
diff/grid_stat/grid_stat_GEN_ENS_PROD_240000L_20120410_120000V_OUTPUT.stat

While the CTC counts match in the TRUTH and OUTPUT files, the ORSS statistic difference. Careful inspection reveals that the problem lies in the existing TRUTH output files. Here's the ORSS derivation code:

double TTContingencyTable::orss() const {
   double v, num, den;

   num = fy_oy() * fn_on() - fy_on() * fn_oy();
   den = fy_oy() * fn_on() + fy_on() * fn_oy();

For integer counts (such as those listed below), we get integer overflow when computing the num and den.

DEVELOP
fy_oy() = 52582, fn_on() = 149802, fy_on() = 4586, fy_on() = 6870,
num = -7.44552e+08, den = -6.8154e+08, orss = 1.09245

After switching from integers to doubles, the overflow problem is resolved:

MET#2887
fy_oy() = 52582, fn_on() = 149802, fy_on() = 4586, fy_on() = 6870,
num = 7.84538e+09, den = 7.90839e+09, orss = 0.992032

So while these are real differences, the problem actually lies in the truth data.

Will this PR result in changes to existing METplus Use Cases? [Maybe]

If yes, create a new Update Truth METplus issue to describe them.
Only if the METplus Use Cases are subject to the integer overflow problem described above.
Do these changes introduce new SonarQube findings? [No]

If yes, please describe:

34 code smells are flagged in the "new code" for this PR. However the overall count of code smells of 19,715 for this PR is less than the overall count of 19,800 in the develop branch.

So it's an overall improvement. However, I will quickly review the 34 "new" ones on Mon 9/8/24 and resolve any easy ones I find.

I did resolve some of the 34 issues, bringing the overall number of codes smells down to 19,679. These include some necessary changes from int to double in compute_ci.h/.cc. Note that I did NOT address finding like this:

Use the init-statement to declare "b" inside the if statement.

Since that is a C++17 feature.

Please complete this pull request review by [Tues 9/10/24].

Pull Request Checklist

See the METplus Workflow for details.

Review the source issue metadata (required labels, projects, and milestone).
Complete the PR definition above.
Ensure the PR title matches the feature or bugfix branch name.
Define the PR metadata, as permissions allow.
Select: Reviewer(s) and Development issue
Select: Milestone as the version that will include these changes
Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
Close the linked issue and delete your feature or bugfix branch from GitHub.

…r rather a pointer to doubles.

…m storing integer counts to storing double-precision weights.

…he table contains all integers

…ing PCT to store thresholds in a std::vector.

…orical_weights

…ith the develop branch.

…reshold if needed. While ==0.1 works fine, I found that ==0.05 did not because the last >=1.0 threshold was missing likely do to floating point precision issues. This change should fix that problem.

…eq() instead and fix a couple of equations to snuff out diffs in some CTS statistics.

…orical_weights

…his PR. Note that the compute_ci.h/.cc changes are necessary and good since we should be computing CI's using doubles instead of integer counts.

… as 11. The hope is that that will limit the findings to only those features available in the C++11 standard.

jprestop

Thanks for your work on this task @JohnHalleyGotway. I have reviewed the code changes and the differences. I see the differences that you described with the ORSS, ORSS_NCL, and ORSS_NCU columns of the CTS line type in 5 output files. Thank you for explaining those. Thank you also for your work on the new SonarQube code smells and for being mindful not to apply the C++17 standard recommendations. All other tests passed. I approve this request.

…ce c++17 is used by default

JohnHalleyGotway · 2024-09-09T19:45:39Z

The latest GHA SonarQube run produces results with code smells reduced to 19,225 and it appears that the C++17 issues have been removed.

jprestop

Thanks for your work on removing the C++17 issues. It looks like I need to reapprove after the changes you made, so I am doing that now.

JohnHalleyGotway added 12 commits August 27, 2024 18:01

Per #2887, update NumArray::vals() to return a reference to the vecto…

09f802e

…r rather a pointer to doubles.

Per #2887, switch over the whole ContingencyTable class heirarchy fro…

38a5d4e

…m storing integer counts to storing double-precision weights.

Add ContingencyTable::is_integer() member function to check whether t…

5e1d81d

…he table contains all integers

Per #2887, update parse_stat_line.cc to get it to compile after chang…

32f3601

…ing PCT to store thresholds in a std::vector.

Per #2887, update PCTInfo::clear() logic.

5834afd

Merge remote-tracking branch 'origin/develop' into feature_2887_categ…

f2a1fd0

…orical_weights

Per #2887, update ctc_by_row() logic to create reproducible results w…

8c629e6

…ith the develop branch.

Per #2887, update logic of define_prob_bins() to add a final >=1.0 th…

b98276a

…reshold if needed. While ==0.1 works fine, I found that ==0.05 did not because the last >=1.0 threshold was missing likely do to floating point precision issues. This change should fix that problem.

Per #2887, update roc_auc() function to match the develop branch

dacd1d2

Per #2887, fix bug if computation of far()

a5403e2

Per #2887, replaced all ==0 integer equality checks with calls to is_…

2a68914

…eq() instead and fix a couple of equations to snuff out diffs in some CTS statistics.

Merge remote-tracking branch 'origin/develop' into feature_2887_categ…

1b8cd06

…orical_weights

JohnHalleyGotway added this to the MET 12.0.0 milestone Sep 6, 2024

JohnHalleyGotway linked an issue Sep 6, 2024 that may be closed by this pull request

Enhance MET to calculate weighted contingency table counts and statistics #2887

Closed

28 tasks

JohnHalleyGotway requested a review from jprestop September 6, 2024 22:20

JohnHalleyGotway added 3 commits September 9, 2024 09:41

Per #2887, address some of the 34 SonarQube code smells flagged for t…

7bce8b8

…his PR. Note that the compute_ci.h/.cc changes are necessary and good since we should be computing CI's using doubles instead of integer counts.

Per #2887, update run_sonarqube.sh to specify the target CXX standard…

f0395b0

… as 11. The hope is that that will limit the findings to only those features available in the C++11 standard.

Per #2887, update to SonarQube version 6.1.0.4477 released on 6/27/2024.

8e1ecef

jprestop previously approved these changes Sep 9, 2024

View reviewed changes

Per #2887, updating build_met_sonarqube.sh to specify --std=c++11 sin…

1144765

…ce c++17 is used by default

JohnHalleyGotway dismissed jprestop’s stale review via 1144765 September 9, 2024 18:55

jprestop approved these changes Sep 9, 2024

View reviewed changes

JohnHalleyGotway merged commit 93014d5 into develop Sep 10, 2024
38 of 39 checks passed

JohnHalleyGotway deleted the feature_2887_categorical_weights branch September 10, 2024 15:21

JohnHalleyGotway restored the feature_2887_categorical_weights branch September 10, 2024 17:56

JohnHalleyGotway mentioned this pull request Sep 10, 2024

Update Truth: For dtcenter/MET#2967 dtcenter/METplus#2688

Closed

11 tasks

JohnHalleyGotway mentioned this pull request Oct 3, 2024

Enhance MET to calculate weighted contingency table counts and statistics #2887

Closed

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature #2887 categorical weights PR 1 of 2 #2967

Feature #2887 categorical weights PR 1 of 2 #2967

JohnHalleyGotway commented Sep 6, 2024 •

edited

Loading

jprestop left a comment

JohnHalleyGotway commented Sep 9, 2024

jprestop left a comment

Feature #2887 categorical weights PR 1 of 2 #2967

Feature #2887 categorical weights PR 1 of 2 #2967

Conversation

JohnHalleyGotway commented Sep 6, 2024 • edited Loading

Expected Differences

Pull Request Testing

Pull Request Checklist

jprestop left a comment

Choose a reason for hiding this comment

JohnHalleyGotway commented Sep 9, 2024

jprestop left a comment

Choose a reason for hiding this comment

JohnHalleyGotway commented Sep 6, 2024 •

edited

Loading