-
We have often found that Machine Learning (ML) algorithms capable of capturing structural non-linearities in training data - models that are sometimes referred to as 'black box' (e.g. Random Forests, Deep Neural Networks, etc.) - perform far better at prediction than their linear counterparts (e.g. Generalized Linear Models).
-
They are, however, much harder to interpret - in fact, quite often it is not possible to gain any insight into why a particular prediction has been produced, when given an instance of input data (i.e. the model features).
-
Consequently, it has not been possible to use 'black box' ML algorithms in situations where clients have sought cause-and-effect explanations for model predictions, with end-results being that sub-optimal predictive models have been used in their place, as their explanatory power has been more valuable, in relative terms.
-
The problem with model explainability is that it’s very hard to define a model’s decision boundary in human understandable manner.
-
Here, I’m using LIME and SHAP to interpret RandomForest model for both classification and regression problems.
- LIME is a python library which tries to solve for model interpretability by producing locally faithful explanations.
- LIME stands for Local Interpretable Model-Agnostic Explanations is a technique to explain the predictions of any machine learning classifier, and evaluate its usefulness in various tasks related to trust.
- SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model.
- SHAP connects game theory with local explanations, uniting several previous methods and representing the only possible consistent and locally accurate additive feature attribution method based on expectations.
-
data folder contains the dataset used in the classification and regression models.
-
Boston.csv is used for regression models. This dataset was taken from UCI Machine Learning Repository from this link.
-
mushrooms.csv is used for classification models. This dataset was taken from UCI Machine Learning Repository from this link.
-
-
-
classification: This folder contains the notebook with classification dataset.
-
regression: This folder contains the notebook with regression dataset.
-
-
-
classification: This folder contains the notebook with classification dataset.
-
regression: This folder contains the notebook with regression dataset.
-