-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] [R-package] Replace "info" interface in lgb.Dataset with keyword arguments #4543
Comments
Agree with all the proposed changes, not only this will make it easier to maintain but also make it easier for users to work with. 👍 |
This work is now complete. See the list of linked pull requests above for details. Thanks very much @StrikerRUS for thorough reviews of so many PRs! |
@jameslamb Thanks a lot for splitting the work into many multiple small PRs! It was a pleasure to review them. |
This issue has been automatically locked since there has not been any recent activity since it was closed. |
Summary
The following changes should be made to
lgb.Dataset()
in the R package."deprecated" = "supported, but raises a warning if used".
In release 3.3.0 (#4310)
info
inlgb.Dataset()
group
,weight
,init_score
, andlabel
tolgb.Dataset()
...
inlgb.Dataset()
([RFC] [R-package] Remove support for passing parameters through '...' #4226 (comment))Dataset$getinfo()
(with a warning its name will be changed toget_field()
)Dataset$setinfo()
(with a warning its name will be changed toset_field()
)Dataset$get_field(field_name)
toDataset
, matching the Python packageLightGBM/python-package/lightgbm/basic.py
Line 1939 in 8a90ea3
Dataset$set_field(field_name, data)
should be added toDataset
, matching the Python packageLightGBM/python-package/lightgbm/basic.py
Line 1890 in 8a90ea3
In release 4.0.0
...
fromlgb.Dataset()
lgb.Dataset()
#4874Dataset$getinfo()
info
fromlgb.Dataset()
Dataset$setinfo()
Motivation
weight
,init_score
, etc. will match keyword args and not be part of...
)LightGBM/R-package/R/lgb.Dataset.R
Lines 51 to 64 in 8a90ea3
init_score
as an argument passed through...
and a differentinit_score
in theinfo
list?"Description
LightGBM training involves some preprocessing like bucketing continuous features into histograms and filtering out unsplittable features. That work is done one time before training begins, in the construction of a
Dataset
object.In addition to the raw data (i.e. features) used, LightGBM
Dataset
objects can also contain the following:label
= an array of values for the target (e.g. 0s and 1s for binary classification)weight
= an array of sample weights, used to tell LightGBM that some samples should be considered more important during traininggroup
= a vector of integers, describing how samples should be grouped together into "query results" (only relevant in the learning-to-rank task)init_score
= a matrix of per-sample initial scores to boost from. This can be used, for example, to start the boosting process from predictions created by another model.References
Dataset
class on the Python side:LightGBM/python-package/lightgbm/basic.py
Lines 1122 to 1128 in 8a90ea3
Other Notes
Sorry I didn't write this up sooner. Didn't really think of it until I started working on adding deprecation warnings for uses of
...
(e.g. in #4522).@Laurae2 and I have already talked about this privately, although would still like to open this as a Request for Comment (RFC) to give everyone who's interested a chance to voice their opinions.
The text was updated successfully, but these errors were encountered: