Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX ignore nan values when summing posteriors #291

Merged
merged 16 commits into from
Jul 15, 2024
Merged

FIX ignore nan values when summing posteriors #291

merged 16 commits into from
Jul 15, 2024

Conversation

PSSF23
Copy link
Member

@PSSF23 PSSF23 commented Jul 3, 2024

Close #290

  • Fix np.nan posteriors when honest_prior="ignore"
  • Set "ignore" as default prior
  • Add & optimize unit tests

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a simple and straightforward fix to me. I'll write unit tests about it, but first want to see if it cause other problems.

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not checking the ignore setting properly: test_impute_posteriors. As it doesn't give an error when there's too many nan's.

The fix is also incomplete, as np.nansum gives zeros when all the tree posteriors are nan's. impute_missing is not working as it should. I'm forcing it to work on zero mask posteriors, as the other two leaf correction methods would not result in zeros.

Copy link

codecov bot commented Jul 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.53%. Comparing base (574a629) to head (c4573ae).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #291      +/-   ##
==========================================
- Coverage   78.55%   78.53%   -0.02%     
==========================================
  Files          24       24              
  Lines        2252     2250       -2     
  Branches      414      413       -1     
==========================================
- Hits         1769     1767       -2     
  Misses        352      352              
  Partials      131      131              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

Re CI failures:

FAILED sktree/stats/tests/test_forestht.py::test_small_dataset_dependent[0] - AssertionError
FAILED sktree/tests/test_extensions.py::test_predict_proba_per_tree[HonestForestClassifier-2] - AssertionError
FAILED sktree/tests/test_extensions.py::test_predict_proba_per_tree[HonestForestClassifier-3] - AssertionError

I would recommend:

  1. test_small_dataset_dependent, just adding the honest_prior='empirical' to make the test backwards compatible.
  2. test_predict_proba_per_tree, add an extra kwarg for HonestForestClassifier for the honest_prior='empirical'. You can also add a test for test_predict_proba_per_tree for HonestForestClassifier if honest_prior='ignore'.

sktree/tests/test_honest_forest.py Outdated Show resolved Hide resolved
sktree/tests/test_honest_forest.py Outdated Show resolved Hide resolved
sktree/tests/test_honest_forest.py Outdated Show resolved Hide resolved
sktree/ensemble/_honest_forest.py Outdated Show resolved Hide resolved
Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 you think this is mergeable?

doc/whats_new/v0.9.rst Outdated Show resolved Hide resolved
@adam2392 adam2392 merged commit fae7e5c into main Jul 15, 2024
30 of 31 checks passed
@adam2392
Copy link
Collaborator

Thanks for the PR @PSSF23

@adam2392 adam2392 deleted the ignore_leaf branch July 15, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG predict_proba() not compatible with ignore setting for empty leaf
2 participants