Catboost #100

Athospd · 2020-04-23T03:11:15Z

Hey! I started this PR for catboost.Model support. If it is something that worth the effort, I would like help especially regarding the pkg design. I certainly think it's worth the effort (for lightgbm too), but my only hesitation is that neither is available on CRAN which makes them more difficult to support as a dependency (and less stable). It looks like there's a work in progress for a CRAN release of catboost, catboost/catboost#439, perhaps seeing where that is would be a good place to start.

catboost.get_feature_importance() offers FeatureImportance, PredictionValuesChange, LossFunctionChange, ShapValues, Interaction, and PredictionDiff natively.

vi_model() - PredictionValuesChange, LossFunctionChange, FeatureImportance
vi_firm()
vi_shap() - ShapValues
vip()
vint() - Interaction (this one is still experimental and doesn't scale well to large data sets like those often used with catboost, although there's tricks in some cases, like maybe relying on catboost's SHAP-based interaction---assuming that's what they implement. This function will probably get overhauled in the next release and renamed to something more appropriate)
get_feature_names
get_training_data - seems to be impossible. catboost.Model obj dont store the training data. I asked for the catboost devs if it is possible. get_trainind_data() is actually a crutch, and probably not the best of practices, but old habit die hard. For algos like catboost, which are meant for scale, never store the training data, so get_training_data.whatever_catboost_classes_is_called() should just thrown an error reminding the user to supply the appropriate training data to vi() via the train argument.
pdp
ice

I didn't find a vi_shap() example to mimic the output so I prefer call for help here. Also, should it relies exclusively upon fastshap package or the native catboost versions is allowed? It should definitely rely on catboost's internal implementation, which would actually a separate PR in fastshap; there's an example with xgboost.

ice and pdp was not touched yet. similar to vi_shap(), vi_firm() relies on the pdp package methods, but it's flexible enough that specific methods shouldn't need to be added, like for xgboost. In fact, if anything, the documentation/vignettes could use some sprucing up, which is where examples with xgboost, catboost, etc. could be given.

ps: there are some commits of workflows methods I created for another PR and now I can't get rid off this PR, I'm sorry =(. No worries!

Athospd added 12 commits April 16, 2020 23:47

vip S3 method for workflow class object

6956b45

vi S3 method for workflow class

04b0b94

vi_model S3 method for workflow class

8ca9adc

docs for S3 methods for workflow class

2f8a59d

tinytests for workflows

342d028

catboost_info in gitignore

a6156c7

catboost in suggests

57c37cc

get_feature_names for catboost.Model

6e9a5da

vi_model for catboost.Model

8ffd975

vi_model document update with catboost info

de885c8

shap removed from catboost vi_model

50c3f7a

catboost tests

0c8f0a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catboost #100

Catboost #100

Athospd commented Apr 23, 2020 •

edited by bgreenwell

Loading

Catboost #100

Are you sure you want to change the base?

Catboost #100

Conversation

Athospd commented Apr 23, 2020 • edited by bgreenwell Loading

Athospd commented Apr 23, 2020 •

edited by bgreenwell

Loading