Inconsistent predictions #11120

Hayakawa94 · 2024-12-19T12:10:01Z

Hi

I need some help with the R xgboost model. I have built a claim severity model using the reg:gamma objective. When assessing the predictions, I noticed different predictions being outputted when iterationrange = c(1,1) is specified. The result is below:

(predict(xgb_model,
newdata = as.matrix(gbm.data%>% select( xgb_model$feature_names ) ) ,
iterationrange = c(1,1)
) / predict(xgb_model,
newdata = as.matrix(gbm.data%>% select( xgb_model$feature_names ) )
) ) %>% summary

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6827 0.9903 0.9990 0.9987 1.0072 1.4092

Which is the correct prediction and which method is used to compute SHAP?

Thanks in advance

Hayakawa94 · 2024-12-19T12:49:04Z

It appears that using the predict function without specifying iterationrange = c(1,1) includes trees from the early stopping rounds in the predictions. For example, if nround = 100, early stopping rounds = 10, and 80 trees were built, the predict function would use 90 trees instead of the 80 that were actually built. Could someone clarify if this would impact the SHAP computations?

david-cortes · 2024-12-19T17:48:44Z

If you are using the latest development version of XGBoost, or if you installed it from GitHub, note that the interpretation of iterationrange = c(1,1) has changed, and the docs have been updated to reflect the new behavior:

xgboost/R-package/R/xgb.Booster.R

Line 118 in 2d1c26b

    
           #' @param iterationrange Sequence of rounds/iterations from the model to use for prediction, specified by passing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent predictions #11120

Inconsistent predictions #11120

Hayakawa94 commented Dec 19, 2024

Hayakawa94 commented Dec 19, 2024

david-cortes commented Dec 19, 2024

Inconsistent predictions #11120

Inconsistent predictions #11120

Comments

Hayakawa94 commented Dec 19, 2024

Hayakawa94 commented Dec 19, 2024

david-cortes commented Dec 19, 2024