Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature #2887 categorical weights PR 1 of 2 #2967

Merged
merged 16 commits into from
Sep 10, 2024

Conversation

JohnHalleyGotway
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway commented Sep 6, 2024

Expected Differences

This pull request changes the contingency table class hierarchy in MET from storing integer counts to storing sums of double-precision weights. I'm breaking this work across multiple PR's to make it easier to track the source of the differences.

The next PR for MET#2887 will actually apply the existing grid_weight_flag to the computation of contingency table-based outputs (CTS, NBRCTS, MCTS, and PSTD).

  • Do these changes introduce new tools, command line arguments, or configuration file options? [No]

    If yes, please describe:

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

    If yes, please describe:

Pull Request Testing

  • Describe testing already performed for these changes:

    Manually ran the regressions test via GHA. Along the way I discovered and fixed bugs in the existing CTS outputs, as described in MET#2958.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

    Review the code changes.

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]
    None needed for this PR. The next PR will update the description of how the grid_weight_flag is used.

  • Do these changes include sufficient testing updates? [Yes]
    No additional tests needed.

  • Will this PR result in changes to the MET test suite? [Yes]

    If yes, describe the new output and/or changes to the existing output:

    The hope was that this PR would result in zero differences. However, this GHA run flags differences in the ORSS, ORSS_NCL, and ORSS_NCU columns of the CTS line type in 5 output files:

diff/perc_thresh/grid_stat_PERC_THRESH_FBIAS_240000L_20120410_000000V_cts_OUTPUT.txt
diff/perc_thresh/grid_stat_PERC_THRESH_FBIAS_240000L_20120410_000000V_OUTPUT.stat
diff/grid_stat/grid_stat_GTG_latlon_060000L_20130827_180000V_OUTPUT.stat
diff/grid_stat/grid_stat_GTG_latlon_060000L_20130827_180000V_cts_OUTPUT.txt
diff/grid_stat/grid_stat_GEN_ENS_PROD_240000L_20120410_120000V_OUTPUT.stat

While the CTC counts match in the TRUTH and OUTPUT files, the ORSS statistic difference. Careful inspection reveals that the problem lies in the existing TRUTH output files. Here's the ORSS derivation code:

double TTContingencyTable::orss() const {
   double v, num, den;

   num = fy_oy() * fn_on() - fy_on() * fn_oy();
   den = fy_oy() * fn_on() + fy_on() * fn_oy();

For integer counts (such as those listed below), we get integer overflow when computing the num and den.

DEVELOP
fy_oy() = 52582, fn_on() = 149802, fy_on() = 4586, fy_on() = 6870,
num = -7.44552e+08, den = -6.8154e+08, orss = 1.09245

After switching from integers to doubles, the overflow problem is resolved:

MET#2887
fy_oy() = 52582, fn_on() = 149802, fy_on() = 4586, fy_on() = 6870,
num = 7.84538e+09, den = 7.90839e+09, orss = 0.992032

So while these are real differences, the problem actually lies in the truth data.

  • Will this PR result in changes to existing METplus Use Cases? [Maybe]

    If yes, create a new Update Truth METplus issue to describe them.
    Only if the METplus Use Cases are subject to the integer overflow problem described above.

  • Do these changes introduce new SonarQube findings? [No]

    If yes, please describe:

34 code smells are flagged in the "new code" for this PR. However the overall count of code smells of 19,715 for this PR is less than the overall count of 19,800 in the develop branch.

So it's an overall improvement. However, I will quickly review the 34 "new" ones on Mon 9/8/24 and resolve any easy ones I find.

I did resolve some of the 34 issues, bringing the overall number of codes smells down to 19,679. These include some necessary changes from int to double in compute_ci.h/.cc. Note that I did NOT address finding like this:

Use the init-statement to declare "b" inside the if statement.

Since that is a C++17 feature.

  • Please complete this pull request review by [Tues 9/10/24].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Milestone as the version that will include these changes
    Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
  • After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

…m storing integer counts to storing double-precision weights.
…ing PCT to store thresholds in a std::vector.
…reshold if needed. While ==0.1 works fine, I found that ==0.05 did not because the last >=1.0 threshold was missing likely do to floating point precision issues. This change should fix that problem.
…eq() instead and fix a couple of equations to snuff out diffs in some CTS statistics.
@JohnHalleyGotway JohnHalleyGotway added this to the MET 12.0.0 milestone Sep 6, 2024
@JohnHalleyGotway JohnHalleyGotway linked an issue Sep 6, 2024 that may be closed by this pull request
28 tasks
…his PR. Note that the compute_ci.h/.cc changes are necessary and good since we should be computing CI's using doubles instead of integer counts.
… as 11. The hope is that that will limit the findings to only those features available in the C++11 standard.
jprestop
jprestop previously approved these changes Sep 9, 2024
Copy link
Collaborator

@jprestop jprestop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this task @JohnHalleyGotway. I have reviewed the code changes and the differences. I see the differences that you described with the ORSS, ORSS_NCL, and ORSS_NCU columns of the CTS line type in 5 output files. Thank you for explaining those. Thank you also for your work on the new SonarQube code smells and for being mindful not to apply the C++17 standard recommendations. All other tests passed. I approve this request.

@JohnHalleyGotway
Copy link
Collaborator Author

The latest GHA SonarQube run produces results with code smells reduced to 19,225 and it appears that the C++17 issues have been removed.

Copy link
Collaborator

@jprestop jprestop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on removing the C++17 issues. It looks like I need to reapprove after the changes you made, so I am doing that now.

@JohnHalleyGotway JohnHalleyGotway merged commit 93014d5 into develop Sep 10, 2024
38 of 39 checks passed
@JohnHalleyGotway JohnHalleyGotway deleted the feature_2887_categorical_weights branch September 10, 2024 15:21
@JohnHalleyGotway JohnHalleyGotway restored the feature_2887_categorical_weights branch September 10, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

Enhance MET to calculate weighted contingency table counts and statistics
2 participants