Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs and tunable_boost_tree adjustments re: lightgbm bagging #768

Merged
merged 2 commits into from
Aug 17, 2022

Conversation

simonpcouch
Copy link
Contributor

Some adjustments to effectively enable + document the sample_size argument for the lightgbm engine. Will be followed up by a PR in bonsai—1/2 to close tidymodels/bonsai#30.

The gist:

The `sample_size` argument is translated to the `bagging_fraction` parameter in the `param` argument of `lgb.train`. The argument is interpreted by lightgbm as a _proportion_ rather than a count, so bonsai internally reparameterizes the `sample_size` argument with [dials::sample_prop()] during tuning.
To effectively enable bagging, the user would also need to set the `bagging_freq` argument to lightgbm. `bagging_freq` defaults to 0, which means bagging is disabled, and a `bagging_freq` argument of `k` means that the booster will perform bagging at every `k`th boosting iteration. Thus, by default, the `sample_size` argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to `bagging_freq` and use `k = 1` when the analogue to `bagging_fraction` is in $(0, 1)$. _bonsai will thus automatically set_ `bagging_freq = 1` _in_ `set_engine("lightgbm", ...)` if `sample_size` (i.e. `bagging_fraction`) is not equal to 1 and no `bagging_freq` value is supplied. This default can be overridden by setting the `bagging_freq` argument to `set_engine()` manually.

Will PR as draft and we can come back to this after conf. :)

@simonpcouch simonpcouch marked this pull request as ready for review August 8, 2022 13:11
@simonpcouch simonpcouch requested a review from topepo August 8, 2022 13:17
@simonpcouch
Copy link
Contributor Author

Would be great to have this one in before the next release, @topepo!

@topepo topepo merged commit 93ca436 into main Aug 17, 2022
@topepo topepo deleted the lightgbm-bagging branch August 17, 2022 12:08
@github-actions
Copy link

github-actions bot commented Sep 1, 2022

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lightgbm model doesn't see sample_size parameter from boost_tree
2 participants