[python-package] `init_score` and data structures in custom functions shape for multiclass classification #4046

jmoralez · 2021-03-05T04:37:53Z

Description

When using init_score in multiclass classification it would be very intuitive to use an (n_samples, n_classes) collection, as is suggested in #2595 (comment), however this isn't currently supported in the python package, so you have to reshape this to an (n_samples * n_classes, ) collection.

This probably isn't that big of a deal with local models, however since dask partitions the collections by rows, this could produce problems with the partitioning.

Reproducible example

import lightgbm as lgb
import numpy as np

n_samples = 100
n_features = 2
n_classes = 3

X = np.random.rand(n_samples, n_features)
y = np.random.randint(0, n_classes, n_samples)
init_score = np.random.rand(n_samples, n_classes)
lgb.LGBMClassifier().fit(X, y, init_score=init_score)

Traceback (most recent call last):
  File "init_score.py", line 7, in <module>
    lgb.LGBMClassifier().fit(X, y, init_score=init_score)
  File "/hdd/github/LightGBM/python-package/lightgbm/sklearn.py", line 895, in fit
    callbacks=callbacks, init_model=init_model)
  File "/hdd/github/LightGBM/python-package/lightgbm/sklearn.py", line 688, in fit
    callbacks=callbacks, init_model=init_model)
  File "/hdd/github/LightGBM/python-package/lightgbm/engine.py", line 228, in train
    booster = Booster(params=params, train_set=train_set)
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 2229, in __init__
    train_set.construct()
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1472, in construct
    categorical_feature=self.categorical_feature, params=self.params)
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1294, in _lazy_init
    self.set_init_score(init_score)
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1843, in set_init_score
    init_score = list_to_1d_numpy(init_score, np.float64, name='init_score')
  File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 165, in list_to_1d_numpy
    "It should be list, numpy 1-D array or pandas Series".format(type(data).__name__, name))
TypeError: Wrong type(ndarray) for init_score.
It should be list, numpy 1-D array or pandas Series

Environment info

LightGBM version or commit hash: 37e9878

Command(s) you used to install LightGBM

git clone --recursive https://github.com/microsoft/LightGBM.git
cd LightGBM/python-package
python setup.py install

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2021-03-27T22:33:02Z

I think this can be applied not only for init_score but for all other user-defined data in multiclass case, e.g. y_pred, grad, hess in custom objectives. At least for the consistency. WDYT, @jmoralez ?

jmoralez · 2021-03-30T01:26:58Z

Yes, that'd be nice. I remember a while ago I was very confused when using custom objectives with multiclass classification because grad and hess were 1d. I'll check this week if this would be easy to implement.

StrikerRUS · 2021-03-30T11:51:28Z

We already re-shape prediction result in the opposite way:

LightGBM/python-package/lightgbm/basic.py

Lines 747 to 752 in e4cf2e4

    
           if is_reshape and not is_sparse and preds.size != nrow: 
        
               if preds.size % nrow == 0: 
        
                   preds = preds.reshape(nrow, -1) 
        
               else: 
        
                   raise ValueError('Length of predict result (%d) cannot be divide nrow (%d)' 
        
                                    % (preds.size, nrow))

StrikerRUS · 2021-09-17T19:02:01Z

I'd like to broaden this issue and include data structures used in custom objectives and metrics here based on the conversation above and this comment

I'd also like investigate the possibility to allow grad and hess to be 2d collections as well for custom objectives.
#4150 (comment)

to not split the discussion.

jameslamb · 2021-10-03T17:50:20Z

@StrikerRUS I agree, thanks for re-wording and re-opening this.

shiyu1994 · 2021-10-04T09:44:07Z

@StrikerRUS Thank you. I do think 2D data structures will be more intuitive for multi-class customized gradients and hessians.

StrikerRUS · 2021-12-16T22:46:44Z

Closing this due to the lack of active work on this issue.

jmoralez · 2022-01-04T23:38:53Z

Reopening since I'm working on this.

jmoralez mentioned this issue Mar 5, 2021

[dask] include multiclass-classification task in tests #4048

Merged

StrikerRUS added the feature request label Mar 5, 2021

jmoralez mentioned this issue Apr 1, 2021

[python-package] Support 2d collections as input for init_score in multiclass classification task #4150

Merged

jameslamb mentioned this issue Aug 24, 2021

Weights & Early Stopping with LGBMRegressor #4551

Closed

jmoralez linked a pull request Sep 6, 2021 that will close this issue

[python-package] Support 2d collections as input for init_score in multiclass classification task #4150

Merged

StrikerRUS closed this as completed in #4150 Sep 17, 2021

StrikerRUS reopened this Sep 17, 2021

StrikerRUS changed the title ~~[python-package] init_score shape for multiclass classification~~ [python-package] init_score and data structures in custom functions shape for multiclass classification Sep 17, 2021

StrikerRUS closed this as completed Dec 16, 2021

jmoralez reopened this Jan 4, 2022

jmoralez mentioned this issue Jan 4, 2022

[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective #4925

Merged

shiyu1994 closed this as completed in #4925 Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] `init_score` and data structures in custom functions shape for multiclass classification #4046

[python-package] `init_score` and data structures in custom functions shape for multiclass classification #4046

jmoralez commented Mar 5, 2021 •

edited

Loading

StrikerRUS commented Mar 27, 2021

jmoralez commented Mar 30, 2021

StrikerRUS commented Mar 30, 2021

StrikerRUS commented Sep 17, 2021

jameslamb commented Oct 3, 2021

shiyu1994 commented Oct 4, 2021

StrikerRUS commented Dec 16, 2021

jmoralez commented Jan 4, 2022

[python-package] init_score and data structures in custom functions shape for multiclass classification #4046

[python-package] init_score and data structures in custom functions shape for multiclass classification #4046

Comments

jmoralez commented Mar 5, 2021 • edited Loading

Description

Reproducible example

Environment info

StrikerRUS commented Mar 27, 2021

jmoralez commented Mar 30, 2021

StrikerRUS commented Mar 30, 2021

StrikerRUS commented Sep 17, 2021

jameslamb commented Oct 3, 2021

shiyu1994 commented Oct 4, 2021

StrikerRUS commented Dec 16, 2021

jmoralez commented Jan 4, 2022

[python-package] `init_score` and data structures in custom functions shape for multiclass classification #4046

[python-package] `init_score` and data structures in custom functions shape for multiclass classification #4046

jmoralez commented Mar 5, 2021 •

edited

Loading