Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [R-package] Remove support for passing parameters through '...' #4226

Closed
jameslamb opened this issue Apr 26, 2021 · 16 comments
Closed

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented Apr 26, 2021

Question

I'm opening this request for comment (RFC) to get feedback on the following proposal, originally mentioned in #4202 (comment).

In its next major release, should {lightgbm} remove support for passing parameters through ... in function calls?

This would apply to:

  • lgb.cv()
  • lgb.Dataset()
  • lgb.Dataset.create.valid()
  • lgb.train()
  • lightgbm()

Description

LightGBM's core library supports many parameters, documented at https://lightgbm.readthedocs.io/en/latest/Parameters.html. The exported functions in {lightgbm} (the R package), support at least 3 different ways to pass values for these parameters.

  • in a named list called params
  • (for some parameters) via explicit keyword arguments
  • via implicit keyword arguments with ...

For example, the following calls to lightgbm::lightgbm() are all equally valid ways to say "perform 14 boosting rounds".

library(lightgbm)

data(agaricus.train, package = "lightgbm")
train <- agaricus.train

# option 1 - params
bst <- lightgbm(
    data = train$data
    , label = train$label
    , objective = "binary"
    , params = list(
        num_iterations = 14L
    )
)

# option 2 - explicit keyword argument ("nrounds")
bst <- lightgbm(
    data = train$data
    , label = train$label
    , objective = "binary"
    , nrounds = 14L
)

# option 3 - passing params through ...
bst <- lightgbm(
    data = train$data
    , label = train$label
    , objective = "binary"
    , num_iterations = 14L
)

Motivation

pros

cons

  • breaks existing code that relies on the ... behavior
    • NOTE: maintainers have agreed that the next LightGBM release will be a 4.0.0 release, so this breaking change by itself wouldn't force a new major version of LightGBM

References

@jameslamb
Copy link
Collaborator Author

Before we move forward with this change, I'd like to ask some of the people I know to be users of the R package to weigh in. And of course anyone else is welcome to give their feedback on the idea!

Thanks very much to all of you for your time and effort!

@mayer79
Copy link
Contributor

mayer79 commented Apr 26, 2021

Thanks for the R support! I think it is a very good idea to simplify the interface in this way. It doesn't only simplify the code but also improves the help in some cases. E.g. for lgb.Dataset I was never sure whether to pass "weight" or "weights".

@TalWac
Copy link

TalWac commented Apr 26, 2021

Dear developers,
I had only short interaction with the project, however, simplifying the interface will make it easier for new users like me.

Many thanks for the support !

@issactoast
Copy link
Contributor

I am using LGBM via treesnip these days so this affects my code. I would like to wait for opinions from the maintainers of treesnip. However, if the functions support every parameter through params list option, it seems like a good way go.

@jameslamb
Copy link
Collaborator Author

I am using LGBM via treesnip these days so this affects my code.

Thanks! I think that {treesnip} could be patched in a way that wouldn't break existing code using that package. Basically today {treesnip} collects any keyword arguments you pass through ... of treesnip::train_lightgbm() and passes them to lightgbm::lgb.train().

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L208

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L267

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L278

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L283

I think it could be changed to instead collect those and pass them to params (making special exceptions for the existing explicit keyword arguments like nrounds). If we move forward with the proposal in this RFC, I'm willing to go do the work to create such a patch for {treesnip}, to avoid breaking code that uses that project.

@david-cortes
Copy link
Contributor

I am using LGBM via treesnip these days so this affects my code.

Thanks! I think that {treesnip} could be patched in a way that wouldn't break existing code using that package. Basically today {treesnip} collects any keyword arguments you pass through ... of treesnip::train_lightgbm() and passes them to lightgbm::lgb.train().

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L208

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L267

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L278

https://github.com/curso-r/treesnip/blob/bf27cd871b7fb663a76818f66ea0a5f9bf09f444/R/lightgbm.R#L283

I think it could be changed to instead collect those and pass them to params (making special exceptions for the existing explicit keyword arguments like nrounds). If we move forward with the proposal in this RFC, I'm willing to go do the work to create such a patch for {treesnip}, to avoid breaking code that uses that project.

That wouldn't break the existing code, since arguments which are in ... would be assigned to the named slots in the function call. I think LightGBM wouldn't accept unnamed arguments so if there wasn't any issue before, there shouldn't be an issue now.

@jameslamb
Copy link
Collaborator Author

@david-cortes I don't think that's correct. Consider my example at the top of this issue with num_iteration. There is no named keyword argument num_iteration in the signature of lightgbm::lgb.train()...but passing it that way works today because lgb.train() assumes any keyword arguments not matched to one of its formal arguments must be a valid parameter from https://lightgbm.readthedocs.io/en/latest/Parameters.html.

additional_params <- list(...)
params <- append(params, additional_params)

To be clear, this proposal is not about adding one keyword argument to lgb.train() per parameter in https://lightgbm.readthedocs.io/en/latest/Parameters.html.

@jaredlander
Copy link

I think one of the biggest complaints about {xgboost} is the interface. The choice of a params list and the ... is clunky and leaves you with no autocomplete in an IDE.

A params list feels very outdated for R. I think it's still popular in Python, but not so much in R anymore. If you do go this route, rather than making it a list where the user can enter anything and has no guidance, make it a function whose arguments are the parameters. An example can be seen in tune::control_grid(), which isn't for tuning parameters, but same idea. Since it's a function the user can use autocomplete and get in-context help. And I now see this is what you were talking about in #4195.

Though generally, I like keyword arguments in the function itself. This means I can use autocomplete in the IDE and all the documentation is in one place and feels less clunky than a params list. That said, I know it is possible to get autocomplete to work even with ... and same for documentation. But I'm not sure what magic RStudio uses for that.

As an aside, you used nrounds as a keyword argument but num_iterations for the other examples. Is the parameter name context specific?

@jameslamb
Copy link
Collaborator Author

Thanks very much!

A params list feels very outdated for R.... #4195.

Yes, #4195 is the proposal for exactly what you're talking about.

To be clear, removing the interface of passing a params list is not up for debate in this RFC. This is about the narrow question of removing supporting for one of 3 different ways you can currently pass parameters (through ...).

If there is some other API change you'd like to see in this project's R package (besides #4195), we'd welcome issues describing a proposal and how it is an improvement over the current state.

That said, I know it is possible to get autocomplete to work even with ... and same for documentation. But I'm not sure what magic RStudio uses for that.

I think you're probably talking about how some packages use the {roxygen2} tag #' @inheritDotParams. (examples can be seen in {roxygen2}'s tests: https://github.com/r-lib/roxygen2/blob/6c1e42f061e9435a14bd3a0e01c7f08fc7aa9160/tests/testthat/test-rd-inherit.R#L473).

With that tag, if you set up your stuff like this:

#' @name some_func
#' @inheritDotParams data.table::data.table
some_func <- function(...) {
    someDT <- data.table::data.table(...)
    return(someDT)
}

{roxygen2} will fill out all of the documentation for individual parameters from data.table::data.table(). I don't think that's directly applicable here since {lightgbm} isn't using ... to say "other arguments all passed to exactly one other R function".

As an aside, you used nrounds as a keyword argument but num_iterations for the other examples. Is the parameter name context specific?

In the earliest days of this project's R package, some explicit keyword arguments were added for functions like lgb.train() which can also be specified in params. Unfortunately sometimes those names differed. In this case, the keyword argument nrounds means exactly the same thing as the parameter num_iterations recognized by LightGBM core (https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_iterations). I understand that that is a source of confusion and maybe resolving that difference could also be considered as part of the 4.0.0 release, but that is not related to this RFC so I'd prefer to have that discussion in a separate issue.

@mayer79
Copy link
Contributor

mayer79 commented Apr 29, 2021

We might add a depreciation note in the next R release, like so

if (length(list(...)) {
  warning("Starting from the next major release of LightGBM, passing arguments through ... will be deprecated.")
}

@nhejazi
Copy link

nhejazi commented Apr 30, 2021

Thanks for tagging in this RFC, just catching up here, but I think any work to simplify the interface would go a long way to improving maintainability of wrappers that try to expose LightGBM functionality (like the wrapper in sl3). As with other similar packages that provide interfaces to a variety of ML algorithm (“learner”) implementations, there are design choices required about how to handle passing of arguments, which ... (and even list args) significantly complicate.

In sl3, we’ve taken the approach of throwing away any arguments passed in that are not explicit formal arguments of the learner function (via a custom wrapper around do.call). Of course, there are cases where this needs to be bypassed, but doing so then requires explicitly naming arguments (to our Lrnr_lightgbm via its ...) that shouldn’t discarded. To do this, I had to try to follow down through various function calls in LightGBM to get a sense of exactly where different aspects of ... get passed, to get a sense of what we’d like to support in our interface. I recall this being complicated, and I think removal of ... would declutter the interface in a helpful way.

@jameslamb
Copy link
Collaborator Author

Thanks very much @nhejazi !

In the future, if you run into anything in LightGBM that you're unsure about you're welcome to come here and open an issue. We'd be happy to help, and you might help us to uncover bugs or other opportunities to improve the interface.

This is especially important right now, as we're preparing a 4.0.0 release and are open to the possibility of breaking changes if a strong case can be made for them.


We might add a depreciation note in the next R release, like so

@mayer79 great suggestion! Generally, I agree with you. However depending on when #3234 is merged, there might not be another minor release of LightGBM between now and 4.0. Maintainers decided in #3210 to continue releasing the entire project together, instead of having individual minor / patch releases of different components.

I think the best I can offer right now is that we could add warnings like that here in source control and we could add visual warnings in the relevant places in https://lightgbm.readthedocs.io/en/latest/R/reference/.

@mikemahoney218
Copy link
Contributor

I am sad to see this method for passing parameters be removed! The idea of having parameters passed as a list feels very not-R to me -- for comparison, the other modeling libraries I use daily take top-level named arguments (ranger https://rdrr.io/cran/ranger/man/ranger.html ) or a combination of top-level named arguments and ... (kernlab https://rdrr.io/cran/kernlab/man/ksvm.html ). Passing a named list to parameters feels non-idiomatic to me.

While in my ideal world LightGBM would take after other modeling packages and move parameter arguments to the top-level (though likely unwieldy for a package with this many parameters), having the ... option made it so that personal scripts which automated model-fitting with multiple modeling libraries would also work with LightGBM.

Assuming that this decision won't be revisited, would it be possible to accept nrounds as an argument to parameters? I've had a bunch of students in my courses confused why that method for setting a hyperparameter isn't accepted by the official way they're to set all other hyperparameters.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Oct 29, 2021

Thanks for taking the time to write this up @mikemahoney218 !

Assuming that this decision won't be revisited,

You're correct, this decision won't be revisited. Support for ... will be removed in the R package in version 4.0.0, for the reasons described in the "Motivation" section of this issue's description and based on the feedback received on this RFC over the last few months.

... LightGBM would take after other modeling packages

As noted in this PR's description, {catboost} and {xgboost} also do not expose one keyword argument per parameter. {catboost} forces the use of a list called params, and {xgboost} allows either ... or params.

That said, we would welcome a feature request (in a separate issue, please) describing an alternative interface and explaining, with specific code examples, why that interface would be valuable for you and other users.

There is precedent for this in LightGBM already....this project's Python package contains a function train() that implements the main training logic

def train(
params: Dict[str, Any],
train_set: Dataset,
num_boost_round: int = 100,
valid_sets: Optional[List[Dataset]] = None,
valid_names: Optional[List[str]] = None,
fobj: Optional[_LGBM_CustomObjectiveFunction] = None,
feval: Optional[Union[_LGBM_CustomMetricFunction, List[_LGBM_CustomMetricFunction]]] = None,
init_model: Optional[Union[str, Path, Booster]] = None,
feature_name: Union[List[str], str] = 'auto',
categorical_feature: Union[List[str], List[int], str] = 'auto',
early_stopping_rounds: Optional[int] = None,
evals_result: Optional[Dict[str, Any]] = None,
verbose_eval: Union[bool, int, str] = 'warn',
learning_rates: Optional[Union[List[float], Callable[[int], float]]] = None,
keep_training_booster: bool = False,
callbacks: Optional[List[Callable]] = None
) -> Booster:

and also a wrapper around that interface which provides a scikit-learn compliant interface, for those users using tools in the scikit-learn ecosystem

def fit(self, X, y,
sample_weight=None, init_score=None, group=None,
eval_set=None, eval_names=None, eval_sample_weight=None,
eval_class_weight=None, eval_init_score=None, eval_group=None,
eval_metric=None, early_stopping_rounds=None, verbose='warn',
feature_name='auto', categorical_feature='auto',
callbacks=None, init_model=None):

would it be possible to accept nrounds as an argument to parameters?

Thanks for noting this! I'll open a separate issue, to keep the conversation here focused on the specific topic of removing ....

@jameslamb
Copy link
Collaborator Author

This work is complete. See the linked pull requests for more details.

For those looking to discuss an alternative interface to LightGBM that does not require creating a list of parameters, please contribute to the conversation in #4295.

#4295 (comment) offers a hint at what this project might offer in the future in terms of alternative interfaces for R.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants