Weights & Early Stopping with LGBMRegressor #4551

John64 · 2021-08-24T20:07:54Z

I've been using LightGBM for a while, but mostly with classification & never weights. I have a basic pandas dataframe with the weights in a column. I can't get an early stopping LGBMRegressor, utilizing 'mae', to run. Keep getting this error:
('Wrong type(float) for weight.\nIt should be list, numpy 1-D array or pandas Series',) Wrong type(float) for weight. It should be list, numpy 1-D array or pandas Series
Here is how the model is declared:

obj = 'mae'
eval_setit = obj
model = LGBMRegressor(boosting_type='gbdt', objective=obj, learning_rate=.3, n_jobs = 1, num_threads=1
                              ,early_stopping_round=10, num_iterations=100)

model fit causes error:

weight_sample = X_train['weight'].values
weight_eval = X_test['weight'].to_list()
model.fit(X_train[filter_now].values, y_train.values, eval_set=[(X_test[filter_now].values, y_test.values)], 
    eval_metric=eval_setit, verbose=False, sample_weight=weight_sample, eval_sample_weight=weight_eval)

X_test & X_train are pandas dataframes with y_test & y_train being pandas series

I've tried various combinations of .values, .to_list(), and .ravel() as one post online had said this can happen if the dataframe format doesn't match the weights formats (mis-match of formats) --but not finding a solution. Always the same error as long as "eval_sample_weight" is being given. Without it, everything appears to run, but of course the early_stopping will be equally weighting all samples in the eval_set, creating error.

Hoping someone else has run into this or knows what the error is referencing. There are no nans/nulls in the weightings & they're the correct length. Format for the weighting column in pandas is float64 & all values are between ~.03 and ~.078.
I'm running this in parallel and some of the .to_lists() and .values appear to be slowing things down, so if someone has a cleaner solution to train with a Dataframe, that'd be great. Any help (examples, links, general advice...) would be appreciated. I'm out of ideas. Thank You

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-08-24T20:16:09Z

Thanks for using LightGBM! We need some more information to help you.

What version of LightGBM are you using and how did you install it?
Are you able to provide a fully-reproducible example (some self-contained code that maintainers could run which produces the same error)? That would reduce the effort needed to answer your question.

John64 · 2021-08-24T22:10:46Z

From Mini-Conda list:

lightgbm                  3.2.1            py39h415ef7b_0    conda-forge
python                    3.9.6           h7840368_1_cpython    conda-forge
pandas                    1.3.0            py39h2e25243_0    conda-forge
numpy                     1.21.0           py39h6635163_0    conda-forge

Not sure how ppl usually do this, but this minimal code will reproduce error as long as pickle objects are OK. It wouldn't upload here, so I put into a link in my repository:
https://github.com/John64/dataexamples

import numpy as np
import pandas as pd
import pickle
from lightgbm import LGBMRegressor

tpath = 'D:\\temp\\'
limitcols = ['tb_prec_min', 'mc_ratio3', 'mc_ratio1', 's1_bestscore', 'tb_prec_avg','b_rawRsys'] #'weight'

X_train = pickle.load(open(tpath+ 'X_train.dat','rb'))
X_test = pickle.load(open(tpath+ 'X_test.dat','rb'))
y_train = pickle.load(open(tpath+ 'y_train.dat','rb'))
y_test = pickle.load(open(tpath+ 'y_test.dat','rb'))

model = LGBMRegressor(boosting_type='gbdt', objective='mae', learning_rate=.3, n_jobs = 1, num_threads=1,
                      early_stopping_round=10, num_iterations=100)

weight_sample = X_train['weight'].values
weight_eval = X_test['weight'].to_list()
model.fit(X_train[limitcols], y_train.values, eval_set=[(X_test[limitcols].values, y_test.values)], 
          eval_metric='mae', verbose=True, sample_weight=weight_sample, eval_sample_weight=weight_eval) #, eval_sample_weight=weight_eval(without this it works)

#TypeError: Wrong type(float) for weight. It should be list, numpy 1-D array or pandas Series

Thanks Again

jameslamb · 2021-08-24T22:19:42Z

Thanks very much!

Through that information, you've shared that you're using Python 3.9, version 3.2.1 of lightgbm, and are on Windows (based on the path D:\\temp\\).

I personally don't open pickle files whose origin I don't know about, since it is possible to define arbitrary code to run when an object is unpickled.

You could create a fully-reproducible example for this case by, for example

creating your own dataset with numpy or pandas (example: [python-package] init_score and data structures in custom functions shape for multiclass classification #4046 (comment))
using the utilities and datasets in sklearn.datasets (example: [dask] Parallel tree learner with Dask cannot overfit a small dataset #4471 (comment))
modifying the LightGBM example code (e.g. https://github.com/microsoft/LightGBM/tree/master/examples/python-guide)

John64 · 2021-08-24T23:06:19Z

Will csv work? Got it to do the same with csv files.
Been a while since using numpy for random data, but I'll try in the morning if needed. Thx

X_train = pd.read_csv(tpath+ 'X_train.csv', index_col=0)
X_test = pd.read_csv(tpath+ 'X_test.csv', index_col=0)
y_train = pd.read_csv(tpath+ 'y_train.csv', index_col=0)
y_test = pd.read_csv(tpath+ 'y_test.csv', index_col=0)

[X_test.csv](https://github.com/microsoft/LightGBM/files/7042522/X_test.csv)
[X_train.csv](https://github.com/microsoft/LightGBM/files/7042523/X_train.csv)
[y_test.csv](https://github.com/microsoft/LightGBM/files/7042524/y_test.csv)
[y_train.csv](https://github.com/microsoft/LightGBM/files/7042525/y_train.csv)

StrikerRUS · 2021-08-25T13:07:35Z

Hey @John64! Thanks a lot for the repro!
Just a hint: you can simplify CSV reading like the following:

X_train = pd.read_csv(r'https://github.com/microsoft/LightGBM/files/7042523/X_train.csv', index_col=0)

This is the same error as in #4534. And the solution is in #4534 (comment).

You should pass lists for eval_* arguments, one item per validation pair of X and y.

Just fix these lines in your code:

model.fit(X_train[limitcols], y_train.values, eval_set=[(X_test[limitcols].values, y_test.values)],
          eval_metric='mae', verbose=True, sample_weight=weight_sample, eval_sample_weight=[weight_eval])

Note that eval_sample_weight is a list of arrays:

eval_sample_weight=[weight_eval]

John64 · 2021-08-25T15:47:01Z

Thank You very much Striker. Working Great

github-actions · 2023-08-23T14:21:50Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added the question label Aug 24, 2021

jameslamb added the awaiting response label Aug 24, 2021

no-response bot removed the awaiting response label Aug 24, 2021

StrikerRUS closed this as completed Aug 25, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights & Early Stopping with LGBMRegressor #4551

Weights & Early Stopping with LGBMRegressor #4551

John64 commented Aug 24, 2021

jameslamb commented Aug 24, 2021

John64 commented Aug 24, 2021

jameslamb commented Aug 24, 2021

John64 commented Aug 24, 2021

StrikerRUS commented Aug 25, 2021 •

edited

Loading

John64 commented Aug 25, 2021

github-actions bot commented Aug 23, 2023

Weights & Early Stopping with LGBMRegressor #4551

Weights & Early Stopping with LGBMRegressor #4551

Comments

John64 commented Aug 24, 2021

jameslamb commented Aug 24, 2021

John64 commented Aug 24, 2021

jameslamb commented Aug 24, 2021

John64 commented Aug 24, 2021

StrikerRUS commented Aug 25, 2021 • edited Loading

John64 commented Aug 25, 2021

github-actions bot commented Aug 23, 2023

StrikerRUS commented Aug 25, 2021 •

edited

Loading