Support multi-output regression/classification #524

miaotianyi · 2017-05-17T02:43:15Z

Currently, LightGBM only supports 1-output problems. It would be interesting if LightGBM could support multi-output tasks (multi-output regression, multi-label classification, etc.) like those in multitask lasso.

I've seen a similar request on xgboost, but it hasn't been implemented yet.

Laurae2 · 2017-05-17T05:28:54Z

I think it would require rewriting the whole algorithm from scratch for LightGBM, as it was optimized for a one case.

In the case of xgboost, it requires rewriting the whole algorithm from scratch, which is not possible in the current state unless someone is ready to work on it.

wxchan · 2017-05-18T11:00:18Z

Is there any introduction website or paper about multi-output task, except divided into multiple binary/regression task?

Laurae2 · 2017-05-18T16:56:11Z

@wxchan multi-output tasks require using an objective handling multi-output tasks.

It also includes/requires multi-split support for decision trees (multiple cutting points instead of one cutting point).

I think you can check this as starting point, it's explained very simply: http://users.math.yale.edu/users/gw289/CpSc-445-545/Slides/CPSC445%20-%20Topic%2005%20-%20Classification%20&%20Decision%20Trees.pdf

miaotianyi · 2017-05-20T04:08:33Z

@wxchan I believe GBDT can adapt from multi-class to multi-label classification (where the labels aren't mutually exclusive) without too much additional computational cost.

In multi-label classification, the target y is a n_samples * n_labels matrix, and each column is a binary vector.

Traditionally, at the leaf node of a classification tree, the prediction is generated by an average of one-hot class probability vectors (which represent the classes of the samples belonging to the leaf). Ex. mean([0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 1, 0, 0], ...). When we use it for multi-label classification, the probability vectors will be different. Ex. mean([0, 1, 0, 1], [0, 1, 1, 0], [0, 0, 0, 1], [0, 1, 0, 1], ...).

(I know gradient boosting involves more complex maths, but that's the basic idea.)

For GBDT, some other modifications may be required:

the impurity function: For multi-label classification, impurity functions mentioned in this document should change. Multi-class impurity functions across n classes would be modified into the sum of n 2-class impurity functions for each label.

e.g.
original entropy function: $-\sum_{c\in\mathcal{C}}{p(c)\log p(c)}$
new entropy function for multi-label classification: $-\sum_{l\in\mathcal{L}}{p(l)\log p(l) + (1-p(l))\log (1-p(l))}$

the objective function for gradient boosting: Not certain yet, since metrics like cross entropy also apply to multi-label problems. This may be something interesting to explore.

@Laurae2 I couldn't see the necessity of multi-split support. Theoretically, any split of 1 parent node into more than 2 child nodes can be equivalently represented by a sequence of binary splits. (Did I understand you correctly?)

The class_weight parameter will still be useful. A label with a larger weight would thus be considered more important during the evaluation of each split.

I suppose, for multi-label classification, an implementation within a single lightgbm model could train more efficiently and consume less memory. Relatively speaking, multi-output regression might not be as useful.

Laurae2 · 2017-05-21T09:44:01Z

@TennielMiao multi-output classification is doable in xgboost/LightGBM, it is actually what is being done in multiclass problems but not in an efficient manner. Also, it returns everything, while you might be interested into a specific number of outputs (especially for classification). This is why we have softprob and softmax separate in xgboost (one is giving the raw values for your needs, while the other processes them before giving them to you).

It requires modification of the objective/loss function like you described. For instance, if you were to optimize F1 or F2 score, then you would have to put in the metric part an optimizer which finds the best threshold for each class for each iteration. For the loss function, you would have to find a proxy which is continuous and a local statistic (unlike F1/F2 Score requiring discrete inputs over a global statistic).

For proper multi-output classification, if you can have more splits instead of binary splits you will require a lower depth for trees which would also requires less splits. As the sum of loss from binary splits is only an approximation of the sum of loss from multi-splits (mathematically if you consider a graph with chained losses), the representation of a multi-split is not always identical to the representation of multiple binary splits (if you split more, you have higher odds to end up with something different).

As for the speed, there are two major cases:

O(n) for n classes: using n models for n classes/outputs is the easiest to implement. If you have 10,000 classes, then you have 10,000 models to train.
O(log(n)) for n classes: using 1 model for n classes/outputs is harder to implement and not trivial. It would also mean 10,000 classes would train 2,500x faster (theoretically) than a one-vs-all or one-vs-one classifier/regressor.

For the class_weight you mentioned: @guolinke did you implement the class_weight-like parameter in a previous PR? #314

chivee · 2017-05-26T03:47:19Z

@TennielMiao The main bottleneck for the implementation of multi-output classification( regression) are memories, It's been highly non-efficiency since we need maintain all the residual value of all sample over all features.

guolinke · 2017-05-26T03:50:56Z

Just find a related paper in ICML 2017: "Gradient Boosted Decision Trees for High Dimensional Sparse Output" .

@huanzhang12 is one of the author.

Laurae2 · 2017-05-26T10:05:36Z

The excerpt:

In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that GBDT can easily run out of memory or encounter near-forever running time in this regime, and propose a new GBDT variant, called GBDT-SPARSE to resolve this problem by employing L_0 regularization. We then discuss in detail how to utilize this sparsity to conduct GBDT training, including splitting the nodes, computing the sparse residual, and predicting in sublinear time. Finally, we apply our algorithm to extreme multilabel classification problems, and show that the proposed GBDT-SPARSE achieves an order of magnitude improvements in model size and prediction time over existing methods, while yielding similar performance.

That method seems way faster than O(n) so it will be interesting to see what is used in GBDT-SPARSE to reach that speed.

marugari · 2017-07-03T02:01:20Z

#524 (comment)
Uploaded
http://www.stat.ucdavis.edu/~chohsieh/rf/icml_sparse_GBDT.pdf

huanzhang12 · 2017-07-03T04:58:21Z

@marugari Thanks for posting the link here!
Let me know if you have any questions regarding our paper.

marugari · 2017-07-18T00:38:45Z

@huanzhang12 Trees for top-k labels have same (feature_id, threshold)?
Why the following algorithm fails?

Q <- index set of top-k p_s
for q in Q do
  for j = 1...D do
    for i = 1...N do
  best[q] <- (f_best, t_best)

If this discussion is not suitable for this issue, I will send a e-mail.

huanzhang12 · 2017-07-21T14:27:49Z

@marugari Yes, please send me an email if you have specific questions on our paper.

marugari · 2017-08-16T02:11:18Z

I have made a prototype of #524 (comment).

https://github.com/marugari/LightGBM/tree/fast_multi
https://github.com/marugari/Notebooks/blob/master/LightGBM-Fast_Multi.ipynb

guolinke · 2017-08-16T08:04:19Z

@marugari
Thanks very much, it seems is a training top-k worst class solution ?

huanzhang12 · 2017-08-16T15:51:22Z

@marugari Sorry for my late reply. I will look into this.

marugari · 2017-08-16T21:13:47Z

@guolinke
Yes. This algorithm's focus is not gains but residuals, unlike the paper.

@huanzhang12
It seems that classes improve gain easily often don't match classes have large loss.
This causes poor prediction accuracy in my implementation.

albertauyeung · 2017-09-04T03:08:35Z

Hi all, I have a question and I am not sure if it is related to this discussion. When I train a multi-class model, from the log I can see that the number of trained trees is usually equal to the number of classes times number of iterations. Does that means LGB is training something like a one-vs-all classifier? If so can we not take the output for each class and somehow achieve multilabel classification? Please correct me if I am wrong. Thanks.

marugari · 2017-09-05T01:37:55Z

@albertauyeung
LGB uses softmax (or softprob) and minimizes cross-entropy.
However, the number of trees per iteration should be equal to the number of classes because TreeLearner doesn't support multi-output.

marugari · 2017-11-03T01:13:23Z

Multiclass classification in the TensorFlow Boosted trees.
Does this work well? (I don't believe)
https://arxiv.org/abs/1710.11547v1

shoegazerstella · 2019-05-30T15:02:52Z

Any update on multi-label classification?

guolinke · 2019-07-30T06:07:42Z

temporal solution: using sklearn multi-output wrapper https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html#sklearn.multioutput.MultiOutputRegressor and
https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html#sklearn.multioutput.MultiOutputClassifier
you can use LGBMRegressor or LGBMClassifier as their base estimators.

StrikerRUS · 2019-08-01T16:16:55Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

StatMixedML · 2022-10-14T13:33:59Z

There is a new feature in XGBoost that allows modelling of multiple-outputs:

https://xgboost.readthedocs.io/en/stable/tutorials/multioutput.html

Any plans to include this also into LightGBM? Would be great since then I could also implement a multivariate probabilistic framework, similar to Multi-Target XGBoostLSS Regression that models multiple targets and their dependencies in a probabilistic regression setting.

jameslamb · 2022-10-18T02:23:24Z

Thanks for the @ @StatMixedML .

Also linking this related conversation from XGBoost that you've been contributing on: dmlc/xgboost#2087

And these related LightGBM conversations:

To answer your question directly... I'm not aware of anyone currently working on adding this support to LightGBM. It has been almost a year since LightGBM's last substantive release, so the small team of mostly-volunteer maintainers here is currently focused on trying to get a year of other improvements and bugfixes out in a new major release (#5153). If you're interested in attempting to add multi-output support to LightGBM we can try to support you with reviews and advice, but at this point can't commit to more than that.

StatMixedML · 2023-06-22T15:25:21Z

@jameslamb Kindly asking if there is an update on this?

jameslamb · 2023-06-22T15:28:06Z

Please don't post "is there an update" types of comments here.

We'd welcome your help if you'd like to try to contribute this. Otherwise, you can subscribe to feature requests here to be notified of activity on them.

guolinke added this to the v3.0 milestone Aug 3, 2017

StrikerRUS added the feature request label Oct 2, 2017

marugari mentioned this issue Dec 11, 2017

xgboost very slow for classification with many classes dmlc/xgboost#2926

Closed

guolinke added the help wanted label Jun 13, 2018

StrikerRUS mentioned this issue Apr 10, 2019

Custom loss function for the multiple responses/outputs #1985

Closed

guolinke mentioned this issue Aug 1, 2019

Feature Requests & Voting Hub #2302

Open

guolinke closed this as completed Aug 1, 2019

a-wozniakowski mentioned this issue Jul 29, 2020

[tests][python][scikit-learn] New unit tests and maintenance #3253

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-output regression/classification #524

Support multi-output regression/classification #524

miaotianyi commented May 17, 2017

Laurae2 commented May 17, 2017

wxchan commented May 18, 2017 •

edited

Loading

Laurae2 commented May 18, 2017

miaotianyi commented May 20, 2017 •

edited

Loading

Laurae2 commented May 21, 2017

chivee commented May 26, 2017

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

marugari commented Jul 3, 2017

huanzhang12 commented Jul 3, 2017

marugari commented Jul 18, 2017

huanzhang12 commented Jul 21, 2017

marugari commented Aug 16, 2017

guolinke commented Aug 16, 2017

huanzhang12 commented Aug 16, 2017

marugari commented Aug 16, 2017

albertauyeung commented Sep 4, 2017

marugari commented Sep 5, 2017

marugari commented Nov 3, 2017

shoegazerstella commented May 30, 2019

guolinke commented Jul 30, 2019

StrikerRUS commented Aug 1, 2019

StatMixedML commented Oct 14, 2022 •

edited

Loading

jameslamb commented Oct 18, 2022

StatMixedML commented Jun 22, 2023

jameslamb commented Jun 22, 2023

Support multi-output regression/classification #524

Support multi-output regression/classification #524

Comments

miaotianyi commented May 17, 2017

Laurae2 commented May 17, 2017

wxchan commented May 18, 2017 • edited Loading

Laurae2 commented May 18, 2017

miaotianyi commented May 20, 2017 • edited Loading

Laurae2 commented May 21, 2017

chivee commented May 26, 2017

guolinke commented May 26, 2017

Laurae2 commented May 26, 2017

marugari commented Jul 3, 2017

huanzhang12 commented Jul 3, 2017

marugari commented Jul 18, 2017

huanzhang12 commented Jul 21, 2017

marugari commented Aug 16, 2017

guolinke commented Aug 16, 2017

huanzhang12 commented Aug 16, 2017

marugari commented Aug 16, 2017

albertauyeung commented Sep 4, 2017

marugari commented Sep 5, 2017

marugari commented Nov 3, 2017

shoegazerstella commented May 30, 2019

guolinke commented Jul 30, 2019

StrikerRUS commented Aug 1, 2019

StatMixedML commented Oct 14, 2022 • edited Loading

jameslamb commented Oct 18, 2022

StatMixedML commented Jun 22, 2023

jameslamb commented Jun 22, 2023

wxchan commented May 18, 2017 •

edited

Loading

miaotianyi commented May 20, 2017 •

edited

Loading

StatMixedML commented Oct 14, 2022 •

edited

Loading