Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

Closed
psyntelis opened this issue Feb 16, 2022 · 7 comments
Labels

Comments

@psyntelis
Copy link

psyntelis commented Feb 16, 2022

Description

Hi. I was trying to subclass the LGBMRegressor calls to create a customer training pipeline. To do so, I followed the discussion here on how to do it. This discussion pointed that passing new parameters to the new class should be done in set_params rather than __init__.

Doing so, leads to a warning error
[LightGBM] [Warning] Unknown parameter: test_param
which cannot be suppressed using verbosity = -1

Reproducible example

from lightgbm import LGBMRegressor
from copy import deepcopy
import numpy as np

class LGBMRegressorTest(LGBMRegressor):
    def set_params(self, test_param=2, **params):
        new_params = deepcopy(params)
        new_params["test_param"] = test_param
        super().set_params(**new_params)


def train_model():
    X = np.random.rand(10,5)
    model = LGBMRegressorTest(
        test_param=2,
        verbosity=-1
    )

    model.fit(X=X[:,0:4], y=X[:,4])
    
if __name__ == "__main__":
    train_model()

Environment info

LightGBM version or commit hash: 3.3.2

Command(s) you used to install LightGBM

pip install lightgbm

Additional Comments

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM, and for the excellent write-up with a reproducible example.

There's at least one important difference between your example and what was described in https://github.com/microsoft/LightGBM/issues/3758#issuecomment-932684440....it looks like you want to add a NEW parameter, while #3758 was about sub-classing LightGBM in a way that changed how the values for EXISTING LightGBM parameters were set.

If you want to add a new parameter to a scikit-learn estimator, it needs to be defined as a keyword argument in that estimator's signature.

As mentioned in https://scikit-learn.org/stable/developers/develop.html#parameters-and-init

As model_selection.GridSearchCV uses set_params to apply parameter setting to estimators, it is essential that calling set_params has the same effect as setting parameters using the __init__ method.

That means that if you want to add new parameters when sub-classing a lightgbm.sklearn estimator, you need to add those parameters to __init__().

For an example of this, please see the discussion around how client was added to the estimators in lightgbm.dask:

  • [dask] remove 'client' kwarg from fit() and predict() (fixes #3808) #3883 (review)
  • https://github.com/microsoft/LightGBM/pull/3883/files
  • class DaskLGBMClassifier(LGBMClassifier, _DaskLGBMModel):
    """Distributed version of lightgbm.LGBMClassifier."""
    def __init__(
    self,
    boosting_type: str = 'gbdt',
    num_leaves: int = 31,
    max_depth: int = -1,
    learning_rate: float = 0.1,
    n_estimators: int = 100,
    subsample_for_bin: int = 200000,
    objective: Optional[Union[str, _LGBM_ScikitCustomObjectiveFunction]] = None,
    class_weight: Optional[Union[dict, str]] = None,
    min_split_gain: float = 0.,
    min_child_weight: float = 1e-3,
    min_child_samples: int = 20,
    subsample: float = 1.,
    subsample_freq: int = 0,
    colsample_bytree: float = 1.,
    reg_alpha: float = 0.,
    reg_lambda: float = 0.,
    random_state: Optional[Union[int, np.random.RandomState]] = None,
    n_jobs: int = -1,
    importance_type: str = 'split',
    client: Optional[Client] = None,
    **kwargs: Any
    ):

@jameslamb jameslamb changed the title verbosity does not suppress all warnings how do I sub-class lightgbm.sklearn estimators and add new parameters? Feb 16, 2022
@psyntelis
Copy link
Author

Hi @jameslamb. Thank you for the fast and very detailed reply. Much approeciated!

The thing that confused me with using __init__ was that I couldn't get the default parameters of the subclass to the new class.

For example, this:

from lightgbm import LGBMRegressor
from copy import deepcopy
import numpy as np

class LGBMRegressorTest(LGBMRegressor):
    def __init__(self, test_param=2, **kwargs):
        self.test_param=test_param
        super().__init__(**kwargs)

def train_model():
    X = np.random.rand(10,5)
    model = LGBMRegressorTest(
        test_param=2
    )

    print(model.get_params())
    
if __name__ == "__main__":
    train_model()

will print {'test_param': 2} rather than the full list of default parameters, with the addition of the test_param

Based on the dask.py script, what I am doing wrong here is that I should be setting all the default values to the new parent class, and not expect to get them for free through inheritance. Is that correct?

@StrikerRUS
Copy link
Collaborator

I just want to note that the root cause of

[LightGBM] [Warning] Unknown parameter: test_param
which cannot be suppressed using verbosity = -1

is the same as in #4518. Logging level is set after parameters parsing based on the parsed verbosity param value.

@psyntelis
Copy link
Author

Thank you both for the replies. For completeness, I get the error even if I set the parameters at the __init__ stage. Expanding on the example, I added test_param in __init__ , along with the default values to avoid the inheritance issues, similar to the dask example:

from lightgbm import LGBMRegressor
from copy import deepcopy
import numpy as np

class LGBMRegressorTest(LGBMRegressor):
    def __init__(
            self, 
            test_param=2, 
            boosting_type= 'gbdt',
            num_leaves=31,
            max_depth= -1,
            learning_rate=0.1,
            n_estimators=100,
            subsample_for_bin=200000,
            objective=None,
            class_weight=None,
            min_split_gain=0.,
            min_child_weight=1e-3,
            min_child_samples=20,
            subsample=1.,
            subsample_freq=0,
            colsample_bytree=1.,
            reg_alpha=0.,
            reg_lambda=0.,
            random_state=None,
            n_jobs=-1,
            importance_type='split',
            **kwargs
        ):
        self.test_param = test_param
        super().__init__(
            boosting_type=boosting_type,
            num_leaves=num_leaves,
            max_depth=max_depth,
            learning_rate=learning_rate,
            n_estimators=n_estimators,
            subsample_for_bin=subsample_for_bin,
            objective=objective,
            class_weight=class_weight,
            min_split_gain=min_split_gain,
            min_child_weight=min_child_weight,
            min_child_samples=min_child_samples,
            subsample=subsample,
            subsample_freq=subsample_freq,
            colsample_bytree=colsample_bytree,
            reg_alpha=reg_alpha,
            reg_lambda=reg_lambda,
            random_state=random_state,
            n_jobs=n_jobs,
            importance_type=importance_type,
            **kwargs
         )

def train_model():
    X = np.random.rand(10,5)
    model = LGBMRegressorTest(
        test_param=2,
        verbosity=-1
    )

    print(model.get_params())

    model.fit(X=X[:,0:4], y=X[:,4])
    
if __name__ == "__main__":
    train_model()

Then, print(model.get_params()) returns to

{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'test_param': 2, 'verbosity': -1}

so, test_param has been assigned the value correctly.

However, I still get the same error message

[LightGBM] [Warning] Unknown parameter: test_param

@jameslamb
Copy link
Collaborator

However, I still get the same error message

That warning is telling you that there is no LightGBM parameter called test_param. The LightGBM sklearn estimators expect that every keyword argument passed in the constructor is one of the core parameters recognized by LightGBM, listed at https://lightgbm.readthedocs.io/en/latest/.

Since client, for example, is not such a parameter, we had to take special care everywhere in the Dask interface to avoid passing it through to LightGBM's C++ library.

def _lgb_dask_getstate(self) -> Dict[Any, Any]:
"""Remove un-picklable attributes before serialization."""
client = self.__dict__.pop("client", None)
self._other_params.pop("client", None)
out = deepcopy(self.__dict__)
out.update({"client": None})
self.client = client
return out

params = self.get_params(True)
params.pop("client", None)

model._other_params.pop("client", None)

@psyntelis
Copy link
Author

Thank you both! Much appreciate the detailed explanations.

@jameslamb jameslamb changed the title how do I sub-class lightgbm.sklearn estimators and add new parameters? [python] how do I sub-class lightgbm.sklearn estimators and add new parameters? Feb 17, 2022
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants