Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark Scikit-Learn RF interface as experimental in doc. #4258

Merged
merged 2 commits into from
Mar 15, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 25 additions & 8 deletions doc/rf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@ Random Forests in XGBoost

XGBoost is normally used to train gradient-boosted decision trees and other gradient
boosted models. Random forests use the same model representation and inference, as
gradient-boosted decision trees, but a different training algorithm. There are XGBoost
parameters that enable training a forest in a random forest fashion.
gradient-boosted decision trees, but a different training algorithm. One can use XGBoost
to train a standalone random forest or use random forest as a base model for gradient
boosting. Here we focus on training standalone random forest.

We have native APIs for training random forests since the early days, and a new
Scikit-Learn wrapper after 0.82 (not included in 0.82). Please note that the new
Scikit-Learn wrapper is still **experimental**, which means we might change the interface
whenever needed.

****************
With XGBoost API
Standalone Random Forest With XGBoost API
****************

The following parameters must be set to enable random forest training.
Expand All @@ -22,13 +27,14 @@ The following parameters must be set to enable random forest training.
selection of columns. Normally, ``colsample_bynode`` would be set to a value less than 1
to randomly sample columns at each tree split.
* ``num_parallel_tree`` should be set to the size of the forest being trained.
* ``num_boost_round`` should be set to 1. Note that this is a keyword argument to
``train()``, and is not part of the parameter dictionary.
* ``num_boost_round`` should be set to 1 to prevent XGBoost from boosting multiple random
forests. Note that this is a keyword argument to ``train()``, and is not part of the
parameter dictionary.
* ``eta`` (alias: ``learning_rate``) must be set to 1 when training random forest
regression.
* ``random_state`` can be used to seed the random number generator.


Other parameters should be set in a similar way they are set for gradient boosting. For
instance, ``objective`` will typically be ``reg:linear`` for regression and
``binary:logistic`` for classification, ``lambda`` should be set according to a desired
Expand Down Expand Up @@ -59,7 +65,7 @@ A random forest model can then be trained as follows::


**************************
With Scikit-Learn-Like API
Standalone Random Forest With Scikit-Learn-Like API
**************************

``XGBRFClassifier`` and ``XGBRFRegressor`` are SKL-like classes that provide random forest
Expand All @@ -72,7 +78,18 @@ some of the parameters adjusted accordingly. In particular:
* ``learning_rate`` is set to 1 by default
* ``colsample_bynode`` and ``subsample`` are set to 0.8 by default
* ``booster`` is always ``gbtree``


For a simple example, you can train a random forest regressor with::

from sklearn.model_selection import KFold

# Your code ...

kf = KFold(n_splits=2)
for train_index, test_index in kf.split(X, y):
xgb_model = xgb.XGBRFRegressor(random_state=42).fit(
X[train_index], y[train_index])

Note that these classes have a smaller selection of parameters compared to using
``train()``. In particular, it is impossible to combine random forests with gradient
boosting using this API.
Expand Down
4 changes: 2 additions & 2 deletions python-package/xgboost/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -884,7 +884,7 @@ def evals_result(self):

class XGBRFClassifier(XGBClassifier):
# pylint: disable=missing-docstring
__doc__ = "Implementation of the scikit-learn API "\
__doc__ = "Experimental implementation of the scikit-learn API "\
+ "for XGBoost random forest classification.\n\n"\
+ '\n'.join(XGBModel.__doc__.split('\n')[2:])

Expand Down Expand Up @@ -923,7 +923,7 @@ class XGBRegressor(XGBModel, XGBRegressorBase):

class XGBRFRegressor(XGBRegressor):
# pylint: disable=missing-docstring
__doc__ = "Implementation of the scikit-learn API "\
__doc__ = "Experimental implementation of the scikit-learn API "\
+ "for XGBoost random forest regression.\n\n"\
+ '\n'.join(XGBModel.__doc__.split('\n')[2:])

Expand Down