Staged predict function as in scikit-learn #5031

egemenzeytinci · 2022-02-24T16:31:55Z

In scikit-learn, the staged_predict function allows to see the regression targets at each stage. This is important because it allows to monitoring the model after each step. Here is the link for the function as I determined above: staged_predict

As far as I see, there is no implementation like this function in LightGBM for the each step.

The text was updated successfully, but these errors were encountered:

jameslamb · 2022-02-24T17:47:18Z

Thanks for using LightGBM!

I see the following description at the link you provided

This method allows monitoring (i.e. determine error on testing set) after each stage.

Could you explain a bit more why you think LightGBM would benefit from adding this method to LGBMRegressor in its Python package?

You can already achieve "determine error on testing set after each stage" by providing validation sets:

import lightgbm as lgb
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=10_000, n_features=8, n_informative=5)

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.1,
    random_state=42
)

reg = lgb.LGBMRegressor(
    num_boost_round=10,
    metric=["l2", "mae", "mape"]
)
reg.fit(X_train, y_train, eval_set=[(X_test, y_test)])

# show metrics evaluated at each iteration
reg.evals_result_

{'valid_0': OrderedDict([('l2',
               [9229.674591859464,
                7825.89932531363,
                6625.686405072829,
                5616.5372420227495,
                4797.817180326764]),
              ('l1',
               [76.4812212591533,
                70.12525847773459,
                64.32944081379047,
                58.96086177684842,
                54.24494127560389]),
              ('mape',
               [0.9449547823155805,
                0.9018169596502487,
                0.8558294965519376,
                0.8024717734359049,
                0.7569051056778325])])}

And if you want to get the predictions at each iteration, LightGBM allows you to provide an iteration number to its various predict() methods.

# examples: get models' predictions of the target from training data based on only the first 3 trees
reg.predict(X_train, num_iteration=2)

egemenzeytinci · 2022-02-25T09:12:51Z

Thanks for your answer.

1- staged_predict can be also added to classifier. (staged_predict_proba for the classifier, which doesn't apply to regressor).

2- staged_predict is interesting to us for the following reasons;

The evolution of a prediction could be used as features in a downstream model (akin to an embedding). This is the most interesting feature we're after.
In some of our products, we'd like to present the whole prediction sequence as opposed to just the end result.
We have custom metrics and analyses we'd like to compute on each prediction round.

I get that the predictions can be obtained using the num_iteration kwarg inside a for loop, but AFAIU, to get the whole prediction sequence, this quickly becomes inefficient as thee number of trees grow, and unusable for 100+ trees.

egemenzeytinci · 2022-03-01T11:40:52Z

Gentle ping @jameslamb, is there any update about the issue?

wuzhe1234 · 2022-10-17T07:59:18Z

@egemenzeytinci , save the prediction of each iteration could be a walkaround.

    x_train = df_train["feature"]
    N, F = x_train.shape
    predictions = pd.DataFrame(np.zeros((N, self.epochs), dtype=float))

    def save_predictions(env):
        # have to copy because the buffer is reused in each iteration
        train_preds = env.model._Booster__inner_predict(0).copy()
        predictions.iloc[:, env.iteration] = train_preds

    model = lgb.train(
        self.params,
        dtrain,
        num_boost_round=self.epochs,
        valid_sets=[dtrain, dvalid],
        valid_names=["train", "valid"],
        verbose_eval=20,
        evals_result=evals_result,
        callbacks=[save_predictions],
    )

jameslamb · 2022-10-31T03:57:24Z

Sorry for the delay, this project is really struggling from a lack of maintainer availability at the moment.

If this is something that's standard in scikit-learn for regression and classification, we're open to adding it to LightGBM's scikit-learn API. But we can't make any commitment to that happening in the near future.

If you're very interested in seeing this in LightGBM, the best way to make that happen soon is probably to contribute it yourself. If you're interested in attempting a pull request, we'd be happy to help with reviews and answers to any questions you have.

github-actions · 2022-11-30T04:03:33Z

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

jameslamb · 2023-08-18T01:49:20Z

Sorry, this was locked accidentally. Just unlocked it. We'd still love help with this feature!

jameslamb added the awaiting response label Feb 24, 2022

no-response bot removed the awaiting response label Feb 25, 2022

jameslamb added the feature request label Oct 31, 2022

jameslamb mentioned this issue Oct 31, 2022

Feature Requests & Voting Hub #2302

Open

jameslamb added the awaiting response label Oct 31, 2022

github-actions bot closed this as completed Nov 30, 2022

This comment was marked as off-topic.

Sign in to view

github-actions bot removed the awaiting response label Aug 15, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023

microsoft unlocked this conversation Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Staged predict function as in scikit-learn #5031

Staged predict function as in scikit-learn #5031

egemenzeytinci commented Feb 24, 2022

jameslamb commented Feb 24, 2022

egemenzeytinci commented Feb 25, 2022 •

edited

Loading

egemenzeytinci commented Mar 1, 2022 •

edited

Loading

wuzhe1234 commented Oct 17, 2022

jameslamb commented Oct 31, 2022

github-actions bot commented Nov 30, 2022

This comment was marked as off-topic.

jameslamb commented Aug 18, 2023

Staged predict function as in scikit-learn #5031

Staged predict function as in scikit-learn #5031

Comments

egemenzeytinci commented Feb 24, 2022

jameslamb commented Feb 24, 2022

egemenzeytinci commented Feb 25, 2022 • edited Loading

egemenzeytinci commented Mar 1, 2022 • edited Loading

wuzhe1234 commented Oct 17, 2022

jameslamb commented Oct 31, 2022

github-actions bot commented Nov 30, 2022

This comment was marked as off-topic.

jameslamb commented Aug 18, 2023

egemenzeytinci commented Feb 25, 2022 •

edited

Loading

egemenzeytinci commented Mar 1, 2022 •

edited

Loading