-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH, BUG] Test honest tree performance via quadratic simulation #164
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
sktree/tests/test_honest_forest.py
Outdated
assert_allclose(np.mean(honestsk_scores), np.mean(honest_scores)) | ||
|
||
|
||
def test_honest_forest_with_sklearn_trees_with_power(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating disc from #152 (comment) here.
Can you, @PSSF23 and I discuss to get a small example we can run?
This is my impression of how the power is computed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference here is that you are computing AUC and I'm computing MI. Otherwise, looks right to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like both MI and AUC have the exact same scores up to 0.05
precision when set with the same random state, so this is leading me to believe there's some issue w/ randomness when running the power-curve simulation, or another bug that we're not thinking of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another test on MI to see if the statistic method affects the test results.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #164 +/- ##
==========================================
+ Coverage 89.59% 90.17% +0.58%
==========================================
Files 46 46
Lines 3710 3767 +57
==========================================
+ Hits 3324 3397 +73
+ Misses 386 370 -16 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Adam Li <adam2392@gmail.com>
…into honest-test
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam2392 I have doubts on these two lines:
Essentially if we set sample_dataset_per_tree
and permute_forest_fraction
to False
, these two lines would result in potential conflicts, or the first line is unnecessary.
The first line can be removed. Or rather more specifically these two lines: https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L818-L820 These lines shouldn't affect the issue described in this PR tho. |
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Fixes #157
Changes proposed in this pull request:
n_estimators
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting