Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH, BUG] Test honest tree performance via quadratic simulation #164

Merged
merged 24 commits into from
Nov 14, 2023

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Nov 2, 2023

Fixes #157

Changes proposed in this pull request:

  • Fixes API for calling n_estimators
  • Adds additional testing towards fixing the honest tree power performance via quadratic simulation

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
assert_allclose(np.mean(honestsk_scores), np.mean(honest_scores))


def test_honest_forest_with_sklearn_trees_with_power():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating disc from #152 (comment) here.

Can you, @PSSF23 and I discuss to get a small example we can run?

This is my impression of how the power is computed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference here is that you are computing AUC and I'm computing MI. Otherwise, looks right to me

Copy link
Collaborator Author

@adam2392 adam2392 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like both MI and AUC have the exact same scores up to 0.05 precision when set with the same random state, so this is leading me to believe there's some issue w/ randomness when running the power-curve simulation, or another bug that we're not thinking of.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another test on MI to see if the statistic method affects the test results.

Copy link

codecov bot commented Nov 6, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (030a064) 89.59% compared to head (c8f619a) 90.17%.

Files Patch % Lines
sktree/stats/forestht.py 73.91% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
+ Coverage   89.59%   90.17%   +0.58%     
==========================================
  Files          46       46              
  Lines        3710     3767      +57     
==========================================
+ Hits         3324     3397      +73     
+ Misses        386      370      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

PSSF23 and others added 6 commits November 6, 2023 13:39
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
adam2392 and others added 3 commits November 6, 2023 16:03
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 I have doubts on these two lines:

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1071

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1133

Essentially if we set sample_dataset_per_tree and permute_forest_fraction to False, these two lines would result in potential conflicts, or the first line is unnecessary.

@adam2392
Copy link
Collaborator Author

@adam2392 I have doubts on these two lines:

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1071

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1133

Essentially if we set sample_dataset_per_tree and permute_forest_fraction to False, these two lines would result in potential conflicts, or the first line is unnecessary.

The first line can be removed.

Or rather more specifically these two lines: https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L818-L820

These lines shouldn't affect the issue described in this PR tho.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants