[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

psyntelis · 2022-02-16T09:30:33Z

Description

Hi. I was trying to subclass the LGBMRegressor calls to create a customer training pipeline. To do so, I followed the discussion here on how to do it. This discussion pointed that passing new parameters to the new class should be done in set_params rather than __init__.

Doing so, leads to a warning error
[LightGBM] [Warning] Unknown parameter: test_param
which cannot be suppressed using verbosity = -1

Reproducible example

from lightgbm import LGBMRegressor
from copy import deepcopy
import numpy as np

class LGBMRegressorTest(LGBMRegressor):
    def set_params(self, test_param=2, **params):
        new_params = deepcopy(params)
        new_params["test_param"] = test_param
        super().set_params(**new_params)


def train_model():
    X = np.random.rand(10,5)
    model = LGBMRegressorTest(
        test_param=2,
        verbosity=-1
    )

    model.fit(X=X[:,0:4], y=X[:,4])
    
if __name__ == "__main__":
    train_model()

Environment info

LightGBM version or commit hash: 3.3.2

Command(s) you used to install LightGBM

pip install lightgbm

Additional Comments

The text was updated successfully, but these errors were encountered:

jameslamb · 2022-02-16T15:17:48Z

Thanks for using LightGBM, and for the excellent write-up with a reproducible example.

There's at least one important difference between your example and what was described in https://github.com/microsoft/LightGBM/issues/3758#issuecomment-932684440....it looks like you want to add a NEW parameter, while #3758 was about sub-classing LightGBM in a way that changed how the values for EXISTING LightGBM parameters were set.

If you want to add a new parameter to a scikit-learn estimator, it needs to be defined as a keyword argument in that estimator's signature.

As mentioned in https://scikit-learn.org/stable/developers/develop.html#parameters-and-init

As model_selection.GridSearchCV uses set_params to apply parameter setting to estimators, it is essential that calling set_params has the same effect as setting parameters using the __init__ method.

That means that if you want to add new parameters when sub-classing a lightgbm.sklearn estimator, you need to add those parameters to __init__().

For an example of this, please see the discussion around how client was added to the estimators in lightgbm.dask:

[dask] remove 'client' kwarg from fit() and predict() (fixes #3808) #3883 (review)
https://github.com/microsoft/LightGBM/pull/3883/files

LightGBM/python-package/lightgbm/dask.py

Lines 1092 to 1118 in d31346f

    
           class DaskLGBMClassifier(LGBMClassifier, _DaskLGBMModel): 
        
               """Distributed version of lightgbm.LGBMClassifier.""" 
        
               def __init__( 
        
                   self, 
        
                   boosting_type: str = 'gbdt', 
        
                   num_leaves: int = 31, 
        
                   max_depth: int = -1, 
        
                   learning_rate: float = 0.1, 
        
                   n_estimators: int = 100, 
        
                   subsample_for_bin: int = 200000, 
        
                   objective: Optional[Union[str, _LGBM_ScikitCustomObjectiveFunction]] = None, 
        
                   class_weight: Optional[Union[dict, str]] = None, 
        
                   min_split_gain: float = 0., 
        
                   min_child_weight: float = 1e-3, 
        
                   min_child_samples: int = 20, 
        
                   subsample: float = 1., 
        
                   subsample_freq: int = 0, 
        
                   colsample_bytree: float = 1., 
        
                   reg_alpha: float = 0., 
        
                   reg_lambda: float = 0., 
        
                   random_state: Optional[Union[int, np.random.RandomState]] = None, 
        
                   n_jobs: int = -1, 
        
                   importance_type: str = 'split', 
        
                   client: Optional[Client] = None, 
        
                   **kwargs: Any 
        
               ):

psyntelis · 2022-02-16T16:19:37Z

Hi @jameslamb. Thank you for the fast and very detailed reply. Much approeciated!

The thing that confused me with using __init__ was that I couldn't get the default parameters of the subclass to the new class.

For example, this:

from lightgbm import LGBMRegressor
from copy import deepcopy
import numpy as np

class LGBMRegressorTest(LGBMRegressor):
    def __init__(self, test_param=2, **kwargs):
        self.test_param=test_param
        super().__init__(**kwargs)

def train_model():
    X = np.random.rand(10,5)
    model = LGBMRegressorTest(
        test_param=2
    )

    print(model.get_params())
    
if __name__ == "__main__":
    train_model()

will print {'test_param': 2} rather than the full list of default parameters, with the addition of the test_param

Based on the dask.py script, what I am doing wrong here is that I should be setting all the default values to the new parent class, and not expect to get them for free through inheritance. Is that correct?

StrikerRUS · 2022-02-16T18:58:09Z

I just want to note that the root cause of

[LightGBM] [Warning] Unknown parameter: test_param
which cannot be suppressed using verbosity = -1

is the same as in #4518. Logging level is set after parameters parsing based on the parsed verbosity param value.

psyntelis · 2022-02-16T20:13:53Z

Thank you both for the replies. For completeness, I get the error even if I set the parameters at the __init__ stage. Expanding on the example, I added test_param in __init__ , along with the default values to avoid the inheritance issues, similar to the dask example:

from lightgbm import LGBMRegressor
from copy import deepcopy
import numpy as np

class LGBMRegressorTest(LGBMRegressor):
    def __init__(
            self, 
            test_param=2, 
            boosting_type= 'gbdt',
            num_leaves=31,
            max_depth= -1,
            learning_rate=0.1,
            n_estimators=100,
            subsample_for_bin=200000,
            objective=None,
            class_weight=None,
            min_split_gain=0.,
            min_child_weight=1e-3,
            min_child_samples=20,
            subsample=1.,
            subsample_freq=0,
            colsample_bytree=1.,
            reg_alpha=0.,
            reg_lambda=0.,
            random_state=None,
            n_jobs=-1,
            importance_type='split',
            **kwargs
        ):
        self.test_param = test_param
        super().__init__(
            boosting_type=boosting_type,
            num_leaves=num_leaves,
            max_depth=max_depth,
            learning_rate=learning_rate,
            n_estimators=n_estimators,
            subsample_for_bin=subsample_for_bin,
            objective=objective,
            class_weight=class_weight,
            min_split_gain=min_split_gain,
            min_child_weight=min_child_weight,
            min_child_samples=min_child_samples,
            subsample=subsample,
            subsample_freq=subsample_freq,
            colsample_bytree=colsample_bytree,
            reg_alpha=reg_alpha,
            reg_lambda=reg_lambda,
            random_state=random_state,
            n_jobs=n_jobs,
            importance_type=importance_type,
            **kwargs
         )

def train_model():
    X = np.random.rand(10,5)
    model = LGBMRegressorTest(
        test_param=2,
        verbosity=-1
    )

    print(model.get_params())

    model.fit(X=X[:,0:4], y=X[:,4])
    
if __name__ == "__main__":
    train_model()

Then, print(model.get_params()) returns to

{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'test_param': 2, 'verbosity': -1}

so, test_param has been assigned the value correctly.

However, I still get the same error message

[LightGBM] [Warning] Unknown parameter: test_param

jameslamb · 2022-02-16T20:19:30Z

However, I still get the same error message

That warning is telling you that there is no LightGBM parameter called test_param. The LightGBM sklearn estimators expect that every keyword argument passed in the constructor is one of the core parameters recognized by LightGBM, listed at https://lightgbm.readthedocs.io/en/latest/.

Since client, for example, is not such a parameter, we had to take special care everywhere in the Dask interface to avoid passing it through to LightGBM's C++ library.

LightGBM/python-package/lightgbm/dask.py

Lines 1017 to 1024 in d31346f

    
           def _lgb_dask_getstate(self) -> Dict[Any, Any]: 
        
               """Remove un-picklable attributes before serialization.""" 
        
               client = self.__dict__.pop("client", None) 
        
               self._other_params.pop("client", None) 
        
               out = deepcopy(self.__dict__) 
        
               out.update({"client": None}) 
        
               self.client = client 
        
               return out

LightGBM/python-package/lightgbm/dask.py

Lines 1047 to 1048 in d31346f

    
           params = self.get_params(True) 
        
           params.pop("client", None)

LightGBM/python-package/lightgbm/dask.py

Line 1080 in d31346f

model._other_params.pop("client", None)

psyntelis · 2022-02-17T11:50:42Z

Thank you both! Much appreciate the detailed explanations.

github-actions · 2023-08-23T00:20:19Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb changed the title ~~verbosity does not suppress all warnings~~ how do I sub-class lightgbm.sklearn estimators and add new parameters? Feb 16, 2022

jameslamb added the question label Feb 16, 2022

psyntelis closed this as completed Feb 17, 2022

jameslamb changed the title ~~how do I sub-class lightgbm.sklearn estimators and add new parameters?~~ [python] how do I sub-class lightgbm.sklearn estimators and add new parameters? Feb 17, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

psyntelis commented Feb 16, 2022 •

edited

Loading

jameslamb commented Feb 16, 2022

psyntelis commented Feb 16, 2022

StrikerRUS commented Feb 16, 2022

psyntelis commented Feb 16, 2022

jameslamb commented Feb 16, 2022

psyntelis commented Feb 17, 2022

github-actions bot commented Aug 23, 2023

[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

Comments

psyntelis commented Feb 16, 2022 • edited Loading

Description

Reproducible example

Environment info

Additional Comments

jameslamb commented Feb 16, 2022

psyntelis commented Feb 16, 2022

StrikerRUS commented Feb 16, 2022

psyntelis commented Feb 16, 2022

jameslamb commented Feb 16, 2022

psyntelis commented Feb 17, 2022

github-actions bot commented Aug 23, 2023

psyntelis commented Feb 16, 2022 •

edited

Loading