Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catboost #100

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Catboost #100

wants to merge 12 commits into from

Conversation

Athospd
Copy link

@Athospd Athospd commented Apr 23, 2020

Hey! I started this PR for catboost.Model support. If it is something that worth the effort, I would like help especially regarding the pkg design. I certainly think it's worth the effort (for lightgbm too), but my only hesitation is that neither is available on CRAN which makes them more difficult to support as a dependency (and less stable). It looks like there's a work in progress for a CRAN release of catboost, catboost/catboost#439, perhaps seeing where that is would be a good place to start.

catboost.get_feature_importance() offers FeatureImportance, PredictionValuesChange, LossFunctionChange, ShapValues, Interaction, and PredictionDiff natively.

  • vi_model() - PredictionValuesChange, LossFunctionChange, FeatureImportance
  • vi_firm()
  • vi_shap() - ShapValues
  • vip()
  • vint() - Interaction (this one is still experimental and doesn't scale well to large data sets like those often used with catboost, although there's tricks in some cases, like maybe relying on catboost's SHAP-based interaction---assuming that's what they implement. This function will probably get overhauled in the next release and renamed to something more appropriate)
  • get_feature_names
  • get_training_data - seems to be impossible. catboost.Model obj dont store the training data. I asked for the catboost devs if it is possible. get_trainind_data() is actually a crutch, and probably not the best of practices, but old habit die hard. For algos like catboost, which are meant for scale, never store the training data, so get_training_data.whatever_catboost_classes_is_called() should just thrown an error reminding the user to supply the appropriate training data to vi() via the train argument.
  • pdp
  • ice

I didn't find a vi_shap() example to mimic the output so I prefer call for help here. Also, should it relies exclusively upon fastshap package or the native catboost versions is allowed? It should definitely rely on catboost's internal implementation, which would actually a separate PR in fastshap; there's an example with xgboost.

ice and pdp was not touched yet. similar to vi_shap(), vi_firm() relies on the pdp package methods, but it's flexible enough that specific methods shouldn't need to be added, like for xgboost. In fact, if anything, the documentation/vignettes could use some sprucing up, which is where examples with xgboost, catboost, etc. could be given.

ps: there are some commits of workflows methods I created for another PR and now I can't get rid off this PR, I'm sorry =(. No worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant