Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] init_score and data structures in custom functions shape for multiclass classification #4046

Closed
jmoralez opened this issue Mar 5, 2021 · 8 comments · Fixed by #4150 or #4925

Comments

@jmoralez
Copy link
Collaborator

jmoralez commented Mar 5, 2021

Description

When using init_score in multiclass classification it would be very intuitive to use an (n_samples, n_classes) collection, as is suggested in #2595 (comment), however this isn't currently supported in the python package, so you have to reshape this to an (n_samples * n_classes, ) collection.

This probably isn't that big of a deal with local models, however since dask partitions the collections by rows, this could produce problems with the partitioning.

Reproducible example

import lightgbm as lgb
import numpy as np

n_samples = 100
n_features = 2
n_classes = 3

X = np.random.rand(n_samples, n_features)
y = np.random.randint(0, n_classes, n_samples)
init_score = np.random.rand(n_samples, n_classes)
lgb.LGBMClassifier().fit(X, y, init_score=init_score)
Traceback (most recent call last):
  File "init_score.py", line 7, in <module>
    lgb.LGBMClassifier().fit(X, y, init_score=init_score)
  File "/hdd/github/LightGBM/python-package/lightgbm/sklearn.py", line 895, in fit
    callbacks=callbacks, init_model=init_model)
  File "/hdd/github/LightGBM/python-package/lightgbm/sklearn.py", line 688, in fit
    callbacks=callbacks, init_model=init_model)
  File "/hdd/github/LightGBM/python-package/lightgbm/engine.py", line 228, in train
    booster = Booster(params=params, train_set=train_set)
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 2229, in __init__
    train_set.construct()
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1472, in construct
    categorical_feature=self.categorical_feature, params=self.params)
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1294, in _lazy_init
    self.set_init_score(init_score)
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1843, in set_init_score
    init_score = list_to_1d_numpy(init_score, np.float64, name='init_score')
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 165, in list_to_1d_numpy
    "It should be list, numpy 1-D array or pandas Series".format(type(data).__name__, name))
TypeError: Wrong type(ndarray) for init_score.
It should be list, numpy 1-D array or pandas Series

Environment info

LightGBM version or commit hash: 37e9878

Command(s) you used to install LightGBM

git clone --recursive https://github.com/microsoft/LightGBM.git
cd LightGBM/python-package
python setup.py install
@StrikerRUS
Copy link
Collaborator

I think this can be applied not only for init_score but for all other user-defined data in multiclass case, e.g. y_pred, grad, hess in custom objectives. At least for the consistency. WDYT, @jmoralez ?

@jmoralez
Copy link
Collaborator Author

Yes, that'd be nice. I remember a while ago I was very confused when using custom objectives with multiclass classification because grad and hess were 1d. I'll check this week if this would be easy to implement.

@StrikerRUS
Copy link
Collaborator

We already re-shape prediction result in the opposite way:

if is_reshape and not is_sparse and preds.size != nrow:
if preds.size % nrow == 0:
preds = preds.reshape(nrow, -1)
else:
raise ValueError('Length of predict result (%d) cannot be divide nrow (%d)'
% (preds.size, nrow))

@StrikerRUS
Copy link
Collaborator

I'd like to broaden this issue and include data structures used in custom objectives and metrics here based on the conversation above and this comment

I'd also like investigate the possibility to allow grad and hess to be 2d collections as well for custom objectives.
#4150 (comment)

to not split the discussion.

@StrikerRUS StrikerRUS reopened this Sep 17, 2021
@StrikerRUS StrikerRUS changed the title [python-package] init_score shape for multiclass classification [python-package] init_score and data structures in custom functions shape for multiclass classification Sep 17, 2021
@jameslamb
Copy link
Collaborator

@StrikerRUS I agree, thanks for re-wording and re-opening this.

@shiyu1994
Copy link
Collaborator

@StrikerRUS Thank you. I do think 2D data structures will be more intuitive for multi-class customized gradients and hessians.

@StrikerRUS
Copy link
Collaborator

Closing this due to the lack of active work on this issue.

@jmoralez
Copy link
Collaborator Author

jmoralez commented Jan 4, 2022

Reopening since I'm working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment