[ENH, BUG] Test honest tree performance via quadratic simulation #164

adam2392 · 2023-11-02T19:28:43Z

Fixes #157

Changes proposed in this pull request:

Fixes API for calling n_estimators
Adds additional testing towards fixing the honest tree power performance via quadratic simulation

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-11-02T19:39:55Z

sktree/tests/test_honest_forest.py

+    assert_allclose(np.mean(honestsk_scores), np.mean(honest_scores))
+
+
+def test_honest_forest_with_sklearn_trees_with_power():


Migrating disc from #152 (comment) here.

Can you, @PSSF23 and I discuss to get a small example we can run?

This is my impression of how the power is computed

The only difference here is that you are computing AUC and I'm computing MI. Otherwise, looks right to me

Looks like both MI and AUC have the exact same scores up to 0.05 precision when set with the same random state, so this is leading me to believe there's some issue w/ randomness when running the power-curve simulation, or another bug that we're not thinking of.

PSSF23

I added another test on MI to see if the statistic method affects the test results.

codecov · 2023-11-06T18:24:17Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (030a064) 89.59% compared to head (c8f619a) 90.17%.

Files	Patch %	Lines
sktree/stats/forestht.py	73.91%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
+ Coverage   89.59%   90.17%   +0.58%     
==========================================
  Files          46       46              
  Lines        3710     3767      +57     
==========================================
+ Hits         3324     3397      +73     
+ Misses        386      370      -16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Adam Li <adam2392@gmail.com>

…into honest-test

Signed-off-by: Adam Li <adam2392@gmail.com>

PSSF23

@adam2392 I have doubts on these two lines:

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1071

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1133

Essentially if we set sample_dataset_per_tree and permute_forest_fraction to False, these two lines would result in potential conflicts, or the first line is unnecessary.

adam2392 · 2023-11-10T15:39:10Z

@adam2392 I have doubts on these two lines:

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1071

https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L1133

Essentially if we set sample_dataset_per_tree and permute_forest_fraction to False, these two lines would result in potential conflicts, or the first line is unnecessary.

The first line can be removed.

Or rather more specifically these two lines: https://github.com/neurodata/scikit-tree/blob/e4728fa9923103c9e219825a46a83b7ba86b483e/sktree/stats/forestht.py#L818-L820

These lines shouldn't affect the issue described in this PR tho.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added 5 commits November 2, 2023 15:26

Test honest tree performance

6e8fbed

Signed-off-by: Adam Li <adam2392@gmail.com>

Migrating changes

e199b1b

Signed-off-by: Adam Li <adam2392@gmail.com>

More

214cb3e

Signed-off-by: Adam Li <adam2392@gmail.com>

Adding another test

ef0c179

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix types

6358498

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 commented Nov 2, 2023

View reviewed changes

TST add MI as separate test

d033b2e

PSSF23 reviewed Nov 6, 2023

View reviewed changes

FIX remove multiview dataset import & fix entropy

bf8b0d1

PSSF23 and others added 6 commits November 6, 2023 13:39

FIX correct import order

92a7b66

Fix up the honest forest

170f969

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'honest-test' of https://github.com/neurodata/scikit-tree …

c02326d

…into honest-test

document test better

714bf76

Signed-off-by: Adam Li <adam2392@gmail.com>

document test better

6a2910b

Signed-off-by: Adam Li <adam2392@gmail.com>

Update submodule

895107e

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added the No Changelog Needed label Nov 6, 2023

adam2392 and others added 3 commits November 6, 2023 16:03

Consolidate and use the same function

aadb185

Signed-off-by: Adam Li <adam2392@gmail.com>

Lower tol

214b360

Signed-off-by: Adam Li <adam2392@gmail.com>

TST optimize tests

29bbae5

PSSF23 reviewed Nov 10, 2023

View reviewed changes

adam2392 added 8 commits November 13, 2023 14:45

merge main

a7883a5

Signed-off-by: Adam Li <adam2392@gmail.com>

Add changelog and unit-test

735ad23

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix n_estimators

63cae19

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix

737f025

Signed-off-by: Adam Li <adam2392@gmail.com>

Up tree count

1d9deab

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix permute forest fraction

903ed34

Signed-off-by: Adam Li <adam2392@gmail.com>

Remove scruff

beba769

Signed-off-by: Adam Li <adam2392@gmail.com>

Run lint

c8f619a

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 merged commit 9c84d6f into main Nov 14, 2023
26 checks passed

adam2392 deleted the honest-test branch November 14, 2023 17:36

This was referenced Nov 14, 2023

Quadratic simulation loses power supposedly when using sklearn-fork DecisionTree vs sklearn DecisionTree #171

Open

[ENH] Allow sampling feature sets separately in MultiViewDTC #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH, BUG] Test honest tree performance via quadratic simulation #164

[ENH, BUG] Test honest tree performance via quadratic simulation #164

adam2392 commented Nov 2, 2023 •

edited

Loading

adam2392 Nov 2, 2023

sampan501 Nov 5, 2023

adam2392 Nov 6, 2023 •

edited

Loading

PSSF23 left a comment

codecov bot commented Nov 6, 2023 •

edited

Loading

PSSF23 left a comment •

edited

Loading

adam2392 commented Nov 10, 2023

		assert_allclose(np.mean(honestsk_scores), np.mean(honest_scores))


		def test_honest_forest_with_sklearn_trees_with_power():

[ENH, BUG] Test honest tree performance via quadratic simulation #164

[ENH, BUG] Test honest tree performance via quadratic simulation #164

Conversation

adam2392 commented Nov 2, 2023 • edited Loading

Before submitting

After submitting

adam2392 Nov 2, 2023

Choose a reason for hiding this comment

sampan501 Nov 5, 2023

Choose a reason for hiding this comment

adam2392 Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

PSSF23 left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 6, 2023 • edited Loading

Codecov Report

PSSF23 left a comment • edited Loading

Choose a reason for hiding this comment

adam2392 commented Nov 10, 2023

adam2392 commented Nov 2, 2023 •

edited

Loading

adam2392 Nov 6, 2023 •

edited

Loading

codecov bot commented Nov 6, 2023 •

edited

Loading

PSSF23 left a comment •

edited

Loading