Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

Closed
civilinformer opened this issue Jan 13, 2021 · 4 comments
Closed

[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

civilinformer opened this issue Jan 13, 2021 · 4 comments
Labels

Comments

@civilinformer
Copy link

I am trying to plug the LGBMRegressor into my ML pipeline and so need to subclass it.

That does not seem to work, well at least parameters are not set and I have not gotten any further than that.

Here is some code to reproduce the problem:

from lightgbm.sklearn import LGBMRegressor

class LRegressor(LGBMRegressor):
    '''
    A light wrapper over LGBMRegressor to deal with the usual problems.
    '''
    version = 0.1
    __version__ = version

    lg_params = [ 'lgbm_use_gpu' ]

    def __init__(self, lgbm_use_gpu=True, **params):
        self.lgbm_use_gpu = lgbm_use_gpu

        lgbm_params = {}
        for key, value in params.items():
            if key not in LRegressor.lg_params:
                lgbm_params[key] = value

        if self.lgbm_use_gpu:
            lgbm_params['device'] = 'gpu'
            lgbm_params['gpu_device_id'] = 0
            lgbm_params['gpu_platform_id'] = 0
            lgbm_params['gpu_use_db'] = True
            lgbm_params['max_bin'] = 256

        super().__init__(**lgbm_params)

    def set_params(self, **params):
        lgbm_params = super().get_params()
        new_params = {}
        wrapper_params = {}
        for key, value in params.items():
            if key in LRegressor.lg_params:
                wrapper_params[key] = value
            elif key in lgbm_params:
                new_params[key] = value
            else:
                print(f"Unknown parameter {key} attempting to set LRegressor with value: {value}")

        for key, value in wrapper_params.items():
            setattr(self, key, value)

        super().set_params(**new_params)

    def get_params(self, deep=False):

        params = super().get_params()
        for key in LRegressor.lg_params:
            params[key] = getattr(self, key)

        return params

Now when trying to use this:

In [1]: from LRegressor import LRegressor as LG

In [2]: lg = LG(silent=False, gpu_device_id=3)                                                                                                                                      
Unknown parameter gpu_device_id attempting to set LRegressor with value: 0
Unknown parameter device attempting to set LRegressor with value: gpu
Unknown parameter gpu_platform_id attempting to set LRegressor with value: 0
Unknown parameter gpu_use_db attempting to set LRegressor with value: True
Unknown parameter max_bin attempting to set LRegressor with value: 256

In [3]: lg.get_params()                                                                                                                                                             
Out[3]: {'lgbm_use_gpu': True}

There is no effect. For some reason subclassing does not work?!

On the other hand getting params works for LGBMRegressor directly, as it should:

In [4]: from lightgbm.sklearn import LGBMRegressor as LG                                                                                                                           

In [5]: lg = LG()                                                                                                                                                                  

In [6]: lg.get_params()                                                                                                                                                            
Out[6]: 
{'boosting_type': 'gbdt',
 'class_weight': None,
 'colsample_bytree': 1.0,
 'importance_type': 'split',
 'learning_rate': 0.1,
 'max_depth': -1,
 'min_child_samples': 20,
 'min_child_weight': 0.001,
 'min_split_gain': 0.0,
 'n_estimators': 100,
 'n_jobs': -1,
 'num_leaves': 31,
 'objective': None,
 'random_state': None,
 'reg_alpha': 0.0,
 'reg_lambda': 0.0,
 'silent': True,
 'subsample': 1.0,
 'subsample_for_bin': 200000,
 'subsample_freq': 0}

Any insight as to why subclassing is not working? Shouldn't it work?

@jhn-nt
Copy link

jhn-nt commented Sep 13, 2021

Having the same issue with LGBMClassifier, subclassing using the sklearn interface seems quite burdensome.

@jameslamb jameslamb changed the title Lightgbm.sklearn LGBMRegressor Subclassing Support [python] Lightgbm.sklearn LGBMRegressor Subclassing Support Oct 2, 2021
@jameslamb
Copy link
Collaborator

This issue was originally opened in January 2021, prior to the lightgbm 3.2.0 release (March 2021).

There have been many changes in lightgbm 3.2.0, 3.2.1, and in the upcoming 3.3.0 release (#4633). I think #3192, which changed the method resolution order to comply with scikit-learn's recommendations, is especially relevant here.


For some reason subclassing does not work?!

I don't think it's accurate to say "subclassing does not work". For example, consider the sample code below.

On latest master (a77260f), I'm able to sub-class lightgbm.sklearn.LGBMRegressor without issue.

git clone --recursive https://github.com/microsoft/LightGBM.git
cd LightGBM/python-package
python setup.py install
from lightgbm import LGBMRegressor
from copy import deepcopy

class CustomRegressor(LGBMRegressor):
    """
    Like ``lightgbm.sklearn.LGBMRegressor``, but always
    sets ``learning_rate`` to 0.123 regardless of what you pass to the constructor,
    just to show it can be done.
    """
    def set_params(self, **params):
        new_params = deepcopy(super().get_params())
        new_params['learning_rate'] = 0.123
        super().set_params(**new_params)

# instantiate a model
reg = CustomRegressor(learning_rate = 0.3)

# notice: the learning_rate value passed to the constructor was ignored and replaced with 0.123
reg.get_params()["learniing_rate"]
# 0.123

# confirm that you can train model with this sub-class
from sklearn.datasets import make_regression
X, y = make_regression()
reg.fit(X, y)
# CustomRegressor(learning_rate=0.123)

I think it would be more accurate to say that "it is not obvious how to create a sub-class of one of lightgbm's scikit-learn estimators which overrides parameters in the constructor".

This is definitely challenging! I was confused by this too until I got some help from @StrikerRUS in #3883 (for example, #3883 (review)).

An approach like the one in the original post will not work because it is incompatible with scikit-learn's expectations for how estimator classes are written, and violating those expectations can lead to unexpected and confusing behavior.

From https://scikit-learn.org/stable/developers/develop.html#instantiation

[in an __init__] There should be no logic, not even input validation, and the parameters should not be changed.

And in https://scikit-learn.org/stable/developers/develop.html#parameters-and-init

As model_selection.GridSearchCV uses set_params to apply parameter setting to estimators, it is essential that calling set_params has the same effect as setting parameters using the __init__ method. The easiest and recommended way to accomplish this is to not do any parameter validation in __init__.

This is why lightgbm's scikit-learn estimators do not call super().__init__(), and store anything passed through **kwargs and not matching an explicit keyword arguments in a private attribute self._other_params.

And it's why lightgbm's Dask estimators (which sub-class lightgbm.sklearn.LGBMRegressor, lightgbm.sklearn.LGBMClassifier, and lgbm.sklearn.LGBMRanker), use explicit keyword arguments when calling super().__init__().

super().__init__(
boosting_type=boosting_type,
num_leaves=num_leaves,
max_depth=max_depth,
learning_rate=learning_rate,
n_estimators=n_estimators,
subsample_for_bin=subsample_for_bin,
objective=objective,
class_weight=class_weight,
min_split_gain=min_split_gain,
min_child_weight=min_child_weight,
min_child_samples=min_child_samples,
subsample=subsample,
subsample_freq=subsample_freq,
colsample_bytree=colsample_bytree,
reg_alpha=reg_alpha,
reg_lambda=reg_lambda,
random_state=random_state,
n_jobs=n_jobs,
silent=silent,
importance_type=importance_type,
**kwargs
)

If you want to achieve this behavior of "set some parameters based on the value of others" , you might have an easier time and run into less surprises by overriding set_params() in a sub-class.

I think that the sample code below, for example, accomplishes the same thing as the intent of the post at the top of this issue ("set other GPU parameters to specific values based on whether or not I'm using the GPU").

from lightgbm import LGBMRegressor

class LRegressor(LGBMRegressor):
        
    def set_params(self, **params):
        new_params = deepcopy(params)
        if new_params.get("device", None) == "gpu":
            print("using GPU")
            self.gpu_device_id = 0
            new_params['gpu_device_id'] = 0
            new_params['gpu_platform_id'] = 0
            new_params['gpu_use_db'] = True
            new_params['max_bin'] = 256
        else:
            print("not using GPU")
        super().set_params(**new_params)

mod = LRegressor(device="gpu")

# notice that all those params like `gpu_device_id`, `gpu_use_db` are set
mod.get_params()

@no-response
Copy link

no-response bot commented Nov 1, 2021

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants