Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quantile-regression forest examples #87

Closed
adam2392 opened this issue Jun 15, 2023 · 22 comments · Fixed by #147
Closed

Add quantile-regression forest examples #87

adam2392 opened this issue Jun 15, 2023 · 22 comments · Fixed by #147
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@adam2392
Copy link
Collaborator

Now the sklearn fork has the ability to predict quantiles. It would be great to demonstrate this ability within scikit-tree. The most important examples to replicate would be:

  1. https://zillow.github.io/quantile-forest/auto_examples/plot_quantile_vs_standard_forest.html#sphx-glr-auto-examples-plot-quantile-vs-standard-forest-py
  2. https://zillow.github.io/quantile-forest/auto_examples/plot_quantile_toy_example.html
  3. https://zillow.github.io/quantile-forest/auto_examples/plot_quantile_regression_intervals.html

But honestly, we should replicate all of the examples in quantile-forest package, since that package is not really maintained anymore. We can then describe how we achieved feature parity w/ quantile-forest and now support other types of quantile-tree methods such as oblique trees and morf trees.

@adam2392 adam2392 added help wanted Extra attention is needed good first issue Good for newcomers labels Jul 7, 2023
@jdey4
Copy link

jdey4 commented Jul 10, 2023

@adam2392 do you need help with this?

@adam2392
Copy link
Collaborator Author

Yes this would be useful to validate that the quantile stuff was added correctly.

You'll have to use the randomforest from the sktree._lib.sklearn rather than from scikitlearn since we added the functionality in the fork.

You can also show it with the Obliquerandomforest

@adam2392
Copy link
Collaborator Author

Lmk if you have any questions.

@jdey4 jdey4 self-assigned this Jul 10, 2023
@adam2392
Copy link
Collaborator Author

@jdey4 are you working on this? If not I will assign @SUKI-O ?

@adam2392
Copy link
Collaborator Author

It would be good to show we can do this for both a normal random forest as well as a oblique random forest.

@jdey4
Copy link

jdey4 commented Sep 18, 2023

I am currently busy with a project! You can assign @SUKI-O.

@SUKI-O SUKI-O self-assigned this Sep 19, 2023
@SUKI-O
Copy link
Member

SUKI-O commented Sep 19, 2023

Yes this would be useful to validate that the quantile stuff was added correctly.

You'll have to use the randomforest from the sktree._lib.sklearn rather than from scikitlearn since we added the functionality in the fork.

You can also show it with the Obliquerandomforest

@adam2392
This line
from sktree._lib.sklearn.ensemble import RandomForestRegressor
gives me the following error:
ImportError: cannot import name 'RandomForestRegressor' from 'sktree._lib.sklearn.ensemble' (unknown location)
Can you tell me what I am doing it wrong? :(

@adam2392
Copy link
Collaborator Author

You are not building correctly prolly

@adam2392
Copy link
Collaborator Author

How are you building

@adam2392
Copy link
Collaborator Author

Another issue that is related is that our API either stores all the training samples in the leaf nodes, or none of them. In quantile-forests, they use max_leaf_samples to define how many samples to keep at random.

  • max_leaf_nodes=None stores all
  • max_leaf_nodes=0 stores None, which should be default for us
  • max_leaf_nodes=1 stores e.g. 1 randomly chosen sample to store.

This can be implemented in upstream scikit-learn code.

@SUKI-O
Copy link
Member

SUKI-O commented Sep 20, 2023

How are you building

./spin build --clean

Update: it works with from sktree._lib.sklearn.ensemble._forest import RandomForestRegressor but I'm not sure if that's supposed to be that way.

@adam2392
Copy link
Collaborator Author

You prolly didn't update the GitHub submodule for sklearn perhaps.

@SUKI-O
Copy link
Member

SUKI-O commented Sep 20, 2023

You prolly didn't update the GitHub submodule for sklearn perhaps.

I'm sorry, not sure what to do exactly. You mean I need to pull sklearn in GH?

@adam2392
Copy link
Collaborator Author

Can you copy paste your input and output error message

@SUKI-O
Copy link
Member

SUKI-O commented Sep 20, 2023

Can you copy paste your input and output error message

Input : from sktree._lib.sklearn.ensemble import RandomForestRegressor
Error: ImportError: cannot import name 'RandomForestRegressor' from 'sktree._lib.sklearn.ensemble' (unknown location)

@adam2392
Copy link
Collaborator Author

adam2392 commented Sep 20, 2023

What are the commands you use? Exact start to end? To build, run test/script/etc.

@SUKI-O
Copy link
Member

SUKI-O commented Sep 20, 2023

Trying to rep the plot from the quantile-forest

import matplotlib.pyplot as plt
import scipy as sp
from sklearn.model_selection import train_test_split
from sklearn.utils.validation import check_random_state

from quantile_forest import RandomForestQuantileRegressor
from sktree._lib.sklearn.ensemble import RandomForestRegressor
from sktree.ensemble import ObliqueRandomForestRegressor


rng = check_random_state(0)

# Create right-skewed dataset.
n_samples = 5000
a, loc, scale = 5, -1, 1
skewnorm_rv = sp.stats.skewnorm(a, loc, scale)
skewnorm_rv.random_state = rng
y = skewnorm_rv.rvs(n_samples)
X = rng.randn(n_samples, 2) * y.reshape(-1, 1)

regr_rf = RandomForestRegressor(n_estimators=10, random_state=0)
regr_qrf = RandomForestQuantileRegressor(n_estimators=10, random_state=0)
regr_orf = ObliqueRandomForestRegressor(n_estimators=10, random_state=0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

regr_rf.fit(X_train, y_train)
regr_qrf.fit(X_train, y_train)
regr_orf.fit(X_train, y_train)

y_pred_rf = regr_rf.predict(X_test)
y_pred_qrf = regr_qrf.predict(X_test, quantiles=0.5)
y_pred_orf = regr_orf.predict(X_test)

colors = ["#c0c0c0", "#f2a619", "#a6e5ff", "#e7a4f5"]
names = ["Actual", "RF (Mean)", "QRF (Median)", "ORF (Mean)"]
plt.hist([y_test, y_pred_rf, y_pred_qrf, y_pred_orf], bins=50, color=colors, label=names)
plt.xlabel("Actual and Predicted Target Values")
plt.ylabel("Counts")
plt.legend()
plt.show()

The line 6 gives me the error

@adam2392
Copy link
Collaborator Author

Ah I see. You have to import it from the correct namespace:

https://docs.neurodata.io/scikit-tree/dev/generated/sktree.RandomForestRegressor.html

Also, you won't want to import quantile-forest. The goal of this issue/PR is to develop a set of examples that conceptually convey what the existing examples in quantile-forest package show. That is, we simply replicate the different examples using *ForestClassifier and *ForestRegressor with storage of leaf node samples.

@reidjohnson
Copy link

Thank you for a great package that addresses a clear need in the Python ecosystem.

As the primary maintainer of quantile-forest, it's encouraging to see the examples adapted for your work. You may also find relevant and useful the more recent examples on multiple-output problems and conformalized quantile regression.

I'd like to mention that quantile-forest is actively maintained and, in keeping with the spirit of open-source collaboration and the guidelines that accompany our package, a brief acknowledgment in any modified files would be greatly appreciated.

That said, your efforts in advancing this field are commendable. I'm looking forward to seeing more of your work.

@adam2392
Copy link
Collaborator Author

Hey @reidjohnson! Thanks for your note. I wasn't aware quantile-forest was still being maintained. That was my mistake.

I also took a look at the old PR and see we didn't provide a link back to quantile-forest. That was our mistake! We apologize for that and did not mean for that to occur. thanks for bringing that to our attention. will submit a PR to rectify.

Glad you like the vision of this package! We tried our best to achieve some type of design close to scikit-learn. If there's anything else you think would be interesting to add or collaborate on, feel free to let us know!

@adam2392
Copy link
Collaborator Author

Let me know if this looks good to you @reidjohnson !

#220

@reidjohnson
Copy link

Absolutely -- thank you @adam2392.

I recognize that this package is evolving rapidly and am excited about the prospect of collaborating should an appropriate opportunity present itself!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants