Implement easy access to single-tree prediction in fitted LGBM model #3058

pransito · 2020-05-08T11:31:10Z

This has been mentioned in #845. However, the suggested solution there is not working. Here I would like to re-emphasize the need and elaborate on the desired feature.

Summary

In sklearn it is super easy to get via "model.estimators_" access to the prediction of every single tree in the ensemble. I mean the single prediction regardless of all other trees (no cumulative prediction). In LightGBM (I am mainly concerned with regression) this is difficult to achieve or even impossible so far (In #845 it was suggested to achieve that via booster.dump_model, leaf_index prediction etc..., but I have not managed to make that work, the values associated with the leaves also seem to be mean-corrected or just reflect the incremental change to the previous tree... but even taking all this into consideration it still is a cumulative prediction and hence super narrow prediction distributions).

Motivation

It would be very useful to have this feature because it certain use cases it is important to get an idea of the distribution of predictions of all the trees (is it wide or narrow; is it skewed). In some way it may be interpreted as a posterior distribution on the metric variable that is to be predicted (in LGBM regression). This is relevant for both classical GBM regression and classical RF regression.

Description

Like in sklearn there should be a .estimators_ object, with a .predict(X) method that gives out the prediction of every single tree for every row in X. It should be easily accessible and not hidden. It should handle whether boost_from_average was used or not automatically. There should be made a clear distinction between cumulative prediction (which is currently implemented with .predict(num_iteration=i) and "iid" prediction (i.e. every single tree on its own), which I suggest to implement as a new feature. One could imagine to have for the .predict() function a flag "cumulative=True" and when set to false, the trees will answer independently from one another.

References

https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/tree/_classes.py#L395

franktoffel · 2020-07-27T13:16:06Z

Any update on this? We are facing similar issues.

guolinke · 2020-07-28T01:51:10Z

@shiyu1994 can you help to check this?

shiyu1994 · 2020-07-28T02:17:15Z

@shiyu1994 can you help to check this?

Maybe we can add a predict_with_tree(tree_id=i) method for Booster. I'll handle this.

StrikerRUS · 2020-07-28T12:26:02Z

Will adding start_iteration parameter to the existing predict method be enough? I think then it will be possible to select one tree with the help of num_iteration and start_iteration. Also, it will be consistent with the API of save_model method (and some others).

LightGBM/python-package/lightgbm/basic.py

Lines 2809 to 2851 in b299de3

    
               def predict(self, data, num_iteration=None, 
        
                           raw_score=False, pred_leaf=False, pred_contrib=False, 
        
                           data_has_header=False, is_reshape=True, **kwargs): 
        
                   """Make a prediction. 
        
                   Parameters 
        
                   ---------- 
        
                   data : string, numpy array, pandas DataFrame, H2O DataTable's Frame or scipy.sparse 
        
                       Data source for prediction. 
        
                       If string, it represents the path to txt file. 
        
                   num_iteration : int or None, optional (default=None) 
        
                       Limit number of iterations in the prediction. 
        
                       If None, if the best iteration exists, it is used; otherwise, all iterations are used. 
        
                       If <= 0, all iterations are used (no limits). 
        
                   raw_score : bool, optional (default=False) 
        
                       Whether to predict raw scores. 
        
                   pred_leaf : bool, optional (default=False) 
        
                       Whether to predict leaf index. 
        
                   pred_contrib : bool, optional (default=False) 
        
                       Whether to predict feature contributions. 
        
                       .. note:: 
        
                           If you want to get more explanations for your model's predictions using SHAP values, 
        
                           like SHAP interaction values, 
        
                           you can install the shap package (https://github.com/slundberg/shap). 
        
                           Note that unlike the shap package, with ``pred_contrib`` we return a matrix with an extra 
        
                           column, where the last column is the expected value. 
        
                   data_has_header : bool, optional (default=False) 
        
                       Whether the data has header. 
        
                       Used only if data is string. 
        
                   is_reshape : bool, optional (default=True) 
        
                       If True, result is reshaped to [nrow, ncol]. 
        
                   **kwargs 
        
                       Other parameters for the prediction. 
        
                   Returns 
        
                   ------- 
        
                   result : numpy array, scipy.sparse or list of scipy.sparse 
        
                       Prediction result. 
        
                       Can be sparse or a list of sparse objects (each element represents predictions for one class) for feature contributions (when ``pred_contrib=True``). 
        
                   """

LightGBM/python-package/lightgbm/basic.py

Lines 2611 to 2633 in b299de3

    
               def save_model(self, filename, num_iteration=None, start_iteration=0, importance_type='split'): 
        
                   """Save Booster to file. 
        
                   Parameters 
        
                   ---------- 
        
                   filename : string 
        
                       Filename to save Booster. 
        
                   num_iteration : int or None, optional (default=None) 
        
                       Index of the iteration that should be saved. 
        
                       If None, if the best iteration exists, it is saved; otherwise, all iterations are saved. 
        
                       If <= 0, all iterations are saved. 
        
                   start_iteration : int, optional (default=0) 
        
                       Start index of the iteration that should be saved. 
        
                   importance_type : string, optional (default="split") 
        
                       What type of feature importance should be saved. 
        
                       If "split", result contains numbers of times the feature is used in a model. 
        
                       If "gain", result contains total gains of splits which use the feature. 
        
                   Returns 
        
                   ------- 
        
                   self : Booster 
        
                       Returns self. 
        
                   """

shiyu1994 · 2020-08-04T05:57:40Z

I've done the implementation as @StrikerRUS suggested. If boost_from_average is enabled, the average score will be integrated into the first tree. So booster.predict(data, start_iteration=0, num_iteration=1), it will provide the score of the first tree with average value added. Does that meet your request? @pransito

)

franktoffel · 2020-08-06T11:10:58Z

Hello Juline, There was some response from LighGBM developers, see email below and ticket in GitHub. Is this what we were missing? Hope you are doing well! Regards

On Tue, 4 Aug 2020 at 07:57, shiyu1994 ***@***.***> wrote: I've done the implementation as @StrikerRUS <https://github.com/StrikerRUS> suggested. If boost_from_average is enabled, the average score will be integrated into the first tree. So booster.predict(data, start_iteration=0, num_iteration=1), it will provide the score of the first tree with average value added. Does that meet your request? @pransito <https://github.com/pransito> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3058 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABIGNMU2DRHDOP4O4SDVEVDR66POFANCNFSM4M4DHCXQ> .

-- *Francisco J. Navarro-Brull*

github-actions · 2023-08-23T22:36:59Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

shiyu1994 added a commit to shiyu1994/LightGBM that referenced this issue Aug 4, 2020

add start_iteration to python predict interface (microsoft#3058)

ac18dec

shiyu1994 mentioned this issue Aug 4, 2020

[Python] add start_iteration to python predict interface (#3058) #3272

Merged

shiyu1994 added a commit to shiyu1994/LightGBM that referenced this issue Aug 4, 2020

add start_iteration to python predict interface (microsoft#3058)

ec61990

shiyu1994 added a commit to shiyu1994/LightGBM that referenced this issue Aug 4, 2020

add start_iteration to python predict interface (microsoft#3058)

3a6e0af

shiyu1994 added a commit to shiyu1994/LightGBM that referenced this issue Aug 4, 2020

[python] add start_iteration to python predict interface (microsoft#3058

5b7c0ed

)

shiyu1994 added a commit to shiyu1994/LightGBM that referenced this issue Aug 5, 2020

[python] add start_iteration to python predict interface (microsoft#3058

5a3be76

)

guolinke closed this as completed in 82e2ff7 Aug 6, 2020

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement easy access to single-tree prediction in fitted LGBM model #3058

Implement easy access to single-tree prediction in fitted LGBM model #3058

pransito commented May 8, 2020

franktoffel commented Jul 27, 2020

guolinke commented Jul 28, 2020

shiyu1994 commented Jul 28, 2020

StrikerRUS commented Jul 28, 2020

shiyu1994 commented Aug 4, 2020

franktoffel commented Aug 6, 2020 via email

github-actions bot commented Aug 23, 2023

Implement easy access to single-tree prediction in fitted LGBM model #3058

Implement easy access to single-tree prediction in fitted LGBM model #3058

Comments

pransito commented May 8, 2020

Summary

Motivation

Description

References

franktoffel commented Jul 27, 2020

guolinke commented Jul 28, 2020

shiyu1994 commented Jul 28, 2020

StrikerRUS commented Jul 28, 2020

shiyu1994 commented Aug 4, 2020

franktoffel commented Aug 6, 2020 via email

github-actions bot commented Aug 23, 2023