Hybrid algorithm based on SEDC and SHAP for computing Evidence Counterfactuals (SHAP-Counterfactual): explaining the model predictions of any classifier using a minimal set of features, such that removing these features results in a predicted class change. (Read here our blogpost on the use of Evidence Counterfactuals.)
SHAP-C is a hybrid algorithm that makes use of the kernel SHAP explainer (proposed by Lundberg & Lee (2017) for explaining model predictions) and the linear implementation of the SEDC algorithm (proposed in this paper as a best-first search algorithm to explain document classifications). The algorithm chooses features to consider as part of the Evidence Counterfactual based on their overall importance for the predicted score. These importance weights can be computed by an additive feature attribution method, such as SHAP. The idea is that, the more accurate the importance rankings are, the more likely it is to find a counterfactual explanation starting from removing the top-ranked feature, and so on, up until the predicted class changes. SHAP-Counterfactual has shown to have stable effectiveness and efficiency, making it a suitable alternative to SEDC, especially for nonlinear models and instances that are "hard to explain" (i.e., many features need to be removed before the class changes). SHAP-Counterfactual immediately solves an issue related to additive feature attribution techniques (such as SHAP) for high-dimensional data, namely, how many features to show in the explanation? How to set this parameter? For the Evidence Counterfactual the answer is: that number of features such that, when removing them, the predicted class changes.
At the moment, SHAP-C supports binary classifiers built on high-dimensional, sparse data where a "zero" feature value corresponds to the "absence" of the feature. (We set the reference value for all feature to zero). For instance, for behavioral data such as web browsing data, visiting an URL would set the feature value to 1, else 0. The "nonzero" value indicates that the behavior is present or the feature is "active". Setting the feature value to zero would remove this evidence from the browsing history of a user. Another example is textual data, where each token is represented by an individual feature. Setting the feature value (term frequency, tf-idf, etc.) to zero would mean that the corresponding token is removed from the document. Because the reference value when removing a feature from the instance is zero (zero means "missing"), and only active features can be part of the Evidence Counterfactual explanation, the SHAP-C implementation makes use of the KernelSHAP where we choose zero as the reference value of each feature.