Python code for SMACE: A New Method for the Interpretability of Composite Decision Systems.
The code is stored in two main repositories: smace
and evaluation
. The first one contains the code behind the method (see below for Usage).
The evaluation
folder contains a notebooks
subfolder, where some simple example are given as Jupyter Notebook.
Aggregated experiments are in the experiments
folder and the results saved in the results
subfolder.
The experiments in Section 5.1 Simple cases of the paper are available as notebooks (in evaluation/notebooks
) :
rule_only.ipynb
refers to Rules onlyhybrid_paper.ipynb
refers to Symple hybrid system The folder also contains additional experiments.
The experiment in Section 5.2 Realistic use case of the paper is generated by telco.py
in evaluation/experiments
. The folder contains additional experiments with different decision-making systems on synthetic data.
These experiments should be performed individually, and when finished, the results will be available in the directory evaluation/experiments/results
.
First, one must define the decision-making system, i.e., a DM
object. To define it, you need a set of rules in JSON format, a list of models, and a pandas DataFrame.
The rules must be defined in a JSON object, resulting in Python lists/dictionaries. Each rule
is a dictionary with two fields: conditions
and decision
. The latter is the output of the decision process, if the rule is satisfied. A condition is defined by the triple (name, operator, value)
:
-
name
is the variable referred to; -
operator
can begeq
($\geq$ ),gt
($>$ ),leq
($\leq$ ),lt
($<$ ); -
value
is the cutoff.
As an example, let us say our set of variables includes four features: model_1
and model_2
. The JSON with two rules rule1
and rule2
can be as follow:
{"rule1": {"conditions": [{"name": "x2",
"operator": "geq",
"value": 0.6},
{"name": "x3",
"operator": "geq",
"value": 0.25},
{"name": "model_1",
"operator": "geq",
"value": 1},
{"name": "model_2",
"operator": "leq",
"value": 50}],
"decision": "decision1"},
"rule2": {"conditions": [{"name": "x4",
"operator": "geq",
"value": 0.1},
{"name": "model_1",
"operator": "geq",
"value": 0.2},
{"name": "x1",
"operator": "geq",
"value": 0.1},
{"name": "x4",
"operator": "leq",
"value": 0.9}],
"decision": "decision2"}
}
Once defined, to read a JSON file one can use the json
(docs here) to read it:
import json
with open('rules.json', 'r') as fp:
rules_json = json.load(fp)
A model can be any function that works on a subset of the original data, with a numerical output. DM
needs a Model
object initialized as Model(predictive_function, model_name, data)
, where
predictive_function
is the function that produces the output. In the case of asklearn
modelm
for regression (resp., for classification), for instance, it corresponds tom.predict
(resp.,m.predict_proba
);model_name
is the name used in the rules to refer to the output of the model;data
is thepandas.DataFrame
to which the model is applied.
For example, assuming we have a dataset X
and two targets y1
and y2
, we can proceed as follows:
from smace.models import Model
lm = linear_model.LinearRegression()
lm.fit(X,y1)
xgb = xgboost.XGBClassifier()
xgb.fit(X,y2)
model_1 = Model(lm.predict, 'model_1', df)
model_2 = Model(xgb.predict_proba, 'model_2', df)
models_list = [model_1, model_2]
Having the rules rules_json
, the list of models models_list
and the input dataset df
, you can construct the DM
object as
from smace.decisions import DM
dm = DM(rules_json, models_list, df)
To get the decision explicitly for an example, we use the make_decision()
function:
example = np.random.rand(4)
decision = dm.make_decision(example, verbose=True)
Output:
Rule(s) ['rule1'] triggered.
Decision(s) ['decision1'] made.
Once the configuration is complete, you can use SMACE to explain the decisions of the defined system.
Let us say we want to explain why for the example above rule2
was not triggered:
from smace.explainer import Smace
explainer = Smace(dm)
explanation = explainer.explain(example, 'rule2')
explanation
contains all the information computed by SMACE. The following methods can be applied:
explanation.table()
andexplanation.bar()
to obtain the overall contributions of the input features as tables or bars, respectively;explanation.rule_table()
andexplanation.rule_bar()
to get the contributions of all variables in the rule as tables or bars, respectively;explanation.model_table('mod')
andexplanation.model_bar('mod')
to get the importance of input features to the model named'mod'
.
It is possible to specify the maximum number of variables to display through the num_features
parameters.
If you use this code please cite
@inproceedings{lopardo2023smace,
title={SMACE: A New Method for the Interpretability of Composite Decision Systems},
author={Lopardo, Gianluigi and Garreau, Damien and Precioso, Fr{\'e}d{\'e}ric and Ottosson, Greger},
booktitle={Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19--23, 2022, Proceedings, Part I},
pages={325--339},
year={2023},
organization={Springer}
}