[ENH] Add option to permute per forest fraction #145

adam2392 · 2023-10-16T18:32:15Z

Discussed in Forest meeting today:

Changes proposed in this pull request:

adds a permute_per_forest_fraction parameter that permutes the covariate_index a controlled number of times over the entire forest (rather than per tree)
Helps resolve the humongous usage of RAM when using large X, or large forests, or many repeated jobs of FeatureImportance*Classifier.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>

codecov · 2023-10-16T19:04:52Z

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (a055049) 88.86% compared to head (f3aa7d7) 89.06%.

❗ Current head f3aa7d7 differs from pull request most recent head 9d0f2db. Consider uploading reports for the commit 9d0f2db to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
+ Coverage   88.86%   89.06%   +0.20%     
==========================================
  Files          41       41              
  Lines        3439     3531      +92     
==========================================
+ Hits         3056     3145      +89     
- Misses        383      386       +3

Files	Coverage Δ
sktree/experimental/mutual_info.py	`20.51% <ø> (ø)`
sktree/stats/permutationforest.py	`77.39% <100.00%> (+0.19%)`	⬆️
sktree/stats/tests/test_coleman.py	`100.00% <ø> (ø)`
sktree/stats/tests/test_forestht.py	`98.46% <100.00%> (-1.09%)`	⬇️
sktree/stats/utils.py	`91.89% <100.00%> (ø)`
sktree/tree/_honest_tree.py	`99.43% <ø> (ø)`
sktree/tree/tests/test_all_trees.py	`100.00% <100.00%> (ø)`
sktree/tree/tests/test_tree.py	`100.00% <100.00%> (+0.48%)`	⬆️
sktree/stats/forestht.py	`95.54% <91.25%> (+0.23%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Adam Li <adam2392@gmail.com>

… into might-params

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-10-18T01:48:33Z

I'll let #143 get merged first, so I can test out changes with this new functionality

… into might-params

Signed-off-by: Adam Li <adam2392@gmail.com>

PSSF23

The tests didn't pass because some indices were not stratified during testing. I also feel the samples variable is used too many times, which could cause confusion. In classifier _statistic, samples could mean test indices or non-nan indices.

PSSF23 · 2023-10-24T13:23:33Z

Should we close this?

adam2392 · 2023-10-24T13:26:09Z

It's a simple addition that's backwards compatible so I think I can just finish adding it.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-11-08T14:11:23Z

Is this mergable? @PSSF23 to preserve the old behavior, just set permute_forest_fraction = 1.0 / n_estimators.

PSSF23 · 2023-11-08T14:20:31Z

@adam2392 We should keep the same behavior for the permute_per_tree parameter, so Sam would not need to change his code. The permute fraction should default to each tree with None value, similar to how sklearn handles the relationship between max_samples and bootstrap. Like raising an error if fraction is not None when permute_per_tree is False.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-11-08T14:36:00Z

@adam2392 We should keep the same behavior for the permute_per_tree parameter, so Sam would not need to change his code. The permute fraction should default to each tree with None value, similar to how sklearn handles the relationship between max_samples and bootstrap. Like raising an error if fraction is not None when permute_per_tree is False.

Currently, permute_per_tree default for Sam is False corresponding to no permutation per tree.

With this PR, permute_per_forest_fraction = None would have the same default. Is this what you mean?

Signed-off-by: Adam Li <adam2392@gmail.com>

PSSF23

@adam2392 Actually it might not be relevant to Sam at the current stage of implementation as he's only running statistics.

My original thought is to preserve any current use of permute_per_tree and treat permute_per_forest_fraction as add-on to it. But as the new parameter fully covers the original use, I think it's fine to upgrade. Can you fix the error in the regressor test?

Signed-off-by: Adam Li <adam2392@gmail.com>

Add option to permute per forest fraction

a3a002d

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added 10 commits October 16, 2023 16:03

Add sep parallel func for building and predicting

5277a5b

Signed-off-by: Adam Li <adam2392@gmail.com>

Finished adding

df3a1b1

Signed-off-by: Adam Li <adam2392@gmail.com>

Modify parallel building

d46f1ad

Signed-off-by: Adam Li <adam2392@gmail.com>

New submodule

17b01ac

Signed-off-by: Adam Li <adam2392@gmail.com>

Add additional pickle test

16122d3

Signed-off-by: Adam Li <adam2392@gmail.com>

Add changelog

7d42ac7

Signed-off-by: Adam Li <adam2392@gmail.com>

Remove unnecessary comments

4423377

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into might-params

1c8eedc

Remove extra LOC

5730b32

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'might-params' of https://github.com/neurodata/scikit-tree…

cd99a11

… into might-params

adam2392 marked this pull request as ready for review October 17, 2023 18:36

adam2392 added 4 commits October 17, 2023 14:37

Merge branch 'main' into might-params

7fe487c

Fix pvalue

6f978cb

Signed-off-by: Adam Li <adam2392@gmail.com>

Lint

58d5365

Signed-off-by: Adam Li <adam2392@gmail.com>

STart work on permute fraction of forest

f5f282a

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 and others added 4 commits October 19, 2023 13:12

Merge branch 'main' into might-params

921eb2f

Merge branch 'might-params' of https://github.com/neurodata/scikit-tree…

261e359

… into might-params

Merging in main

0b167ba

Signed-off-by: Adam Li <adam2392@gmail.com>

FIX add stratifi

3674cc2

PSSF23 reviewed Oct 23, 2023

View reviewed changes

adam2392 added 5 commits October 24, 2023 10:37

Try stash

0f27d01

Signed-off-by: Adam Li <adam2392@gmail.com>

UPdate and address permute forest fraction

e894708

Signed-off-by: Adam Li <adam2392@gmail.com>

WIP

2887909

Signed-off-by: Adam Li <adam2392@gmail.com>

Adding ability to turn off train/test split

739c7be

Signed-off-by: Adam Li <adam2392@gmail.com>

Merging main

30e9e95

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added 2 commits November 8, 2023 09:09

Fix type checK

36c5582

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix typing

1ff7a5c

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 requested review from SUKI-O, PSSF23 and sampan501 November 8, 2023 14:10

Fix ci

c9c22e9

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added 2 commits November 8, 2023 09:49

Remove fluff

2a6cda3

Signed-off-by: Adam Li <adam2392@gmail.com>

Remove any mention of permute_per_tree

b183db1

Signed-off-by: Adam Li <adam2392@gmail.com>

PSSF23 approved these changes Nov 9, 2023

View reviewed changes

adam2392 added 5 commits November 9, 2023 11:47

Merge branch 'main' into might-params

8967d60

Fix slow test

59ae89b

Signed-off-by: Adam Li <adam2392@gmail.com>

Try to fix slow

2e1d53b

Signed-off-by: Adam Li <adam2392@gmail.com>

Update

f3aa7d7

Signed-off-by: Adam Li <adam2392@gmail.com>

Lint

9d0f2db

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 merged commit e4728fa into main Nov 9, 2023
23 checks passed

adam2392 deleted the might-params branch November 9, 2023 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add option to permute per forest fraction #145

[ENH] Add option to permute per forest fraction #145

adam2392 commented Oct 16, 2023

codecov bot commented Oct 16, 2023 •

edited

Loading

adam2392 commented Oct 18, 2023

PSSF23 left a comment •

edited

Loading

PSSF23 commented Oct 24, 2023

adam2392 commented Oct 24, 2023

adam2392 commented Nov 8, 2023

PSSF23 commented Nov 8, 2023 •

edited

Loading

adam2392 commented Nov 8, 2023

PSSF23 left a comment

[ENH] Add option to permute per forest fraction #145

[ENH] Add option to permute per forest fraction #145

Conversation

adam2392 commented Oct 16, 2023

Before submitting

After submitting

codecov bot commented Oct 16, 2023 • edited Loading

Codecov Report

adam2392 commented Oct 18, 2023

PSSF23 left a comment • edited Loading

Choose a reason for hiding this comment

PSSF23 commented Oct 24, 2023

adam2392 commented Oct 24, 2023

adam2392 commented Nov 8, 2023

PSSF23 commented Nov 8, 2023 • edited Loading

adam2392 commented Nov 8, 2023

PSSF23 left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 16, 2023 •

edited

Loading

PSSF23 left a comment •

edited

Loading

PSSF23 commented Nov 8, 2023 •

edited

Loading