Skip to content

Commit

Permalink
DOC Improve docs of permutation importance on the user guide (#27154)
Browse files Browse the repository at this point in the history
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
  • Loading branch information
3 people authored and jeremiedbb committed Jan 17, 2024
1 parent 45a5f6a commit b06f444
Show file tree
Hide file tree
Showing 4 changed files with 80 additions and 29 deletions.
Binary file added doc/images/permuted_non_predictive_feature.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/permuted_predictive_feature.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
105 changes: 78 additions & 27 deletions doc/modules/permutation_importance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,45 @@ Permutation feature importance

.. currentmodule:: sklearn.inspection

Permutation feature importance is a model inspection technique that can be used
for any :term:`fitted` :term:`estimator` when the data is tabular. This is
especially useful for non-linear or opaque :term:`estimators`. The permutation
feature importance is defined to be the decrease in a model score when a single
feature value is randomly shuffled [1]_. This procedure breaks the relationship
between the feature and the target, thus the drop in the model score is
indicative of how much the model depends on the feature. This technique
benefits from being model agnostic and can be calculated many times with
different permutations of the feature.
Permutation feature importance is a model inspection technique that measures the
contribution of each feature to a :term:`fitted` model's statistical performance
on a given tabular dataset. This technique is particularly useful for non-linear
or opaque :term:`estimators`, and involves randomly shuffling the values of a
single feature and observing the resulting degradation of the model's score
[1]_. By breaking the relationship between the feature and the target, we
determine how much the model relies on such particular feature.

In the following figures, we observe the effect of permuting features on the correlation
between the feature and the target and consequently on the model statistical
performance.

.. image:: ../images/permuted_predictive_feature.png
:align: center

.. image:: ../images/permuted_non_predictive_feature.png
:align: center

On the top figure, we observe that permuting a predictive feature breaks the
correlation between the feature and the target, and consequently the model
statistical performance decreases. On the bottom figure, we observe that permuting
a non-predictive feature does not significantly degrade the model statistical performance.

One key advantage of permutation feature importance is that it is
model-agnostic, i.e. it can be applied to any fitted estimator. Moreover, it can
be calculated multiple times with different permutations of the feature, further
providing a measure of the variance in the estimated feature importances for the
specific trained model.

The figure below shows the permutation feature importance of a
:class:`~sklearn.ensemble.RandomForestClassifier` trained on an augmented
version of the titanic dataset that contains a `random_cat` and a `random_num`
features, i.e. a categrical and a numerical feature that are not correlated in
any way with the target variable:

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_002.png
:target: ../auto_examples/inspection/plot_permutation_importance.html
:align: center
:scale: 70

.. warning::

Expand Down Expand Up @@ -74,15 +104,18 @@ highlight which features contribute the most to the generalization power of the
inspected model. Features that are important on the training set but not on the
held-out set might cause the model to overfit.

The permutation feature importance is the decrease in a model score when a single
feature value is randomly shuffled. The score function to be used for the
computation of importances can be specified with the `scoring` argument,
which also accepts multiple scorers. Using multiple scorers is more computationally
efficient than sequentially calling :func:`permutation_importance` several times
with a different scorer, as it reuses model predictions.
The permutation feature importance depends on the score function that is
specified with the `scoring` argument. This argument accepts multiple scorers,
which is more computationally efficient than sequentially calling
:func:`permutation_importance` several times with a different scorer, as it
reuses model predictions.

An example of using multiple scorers is shown below, employing a list of metrics,
but more input formats are possible, as documented in :ref:`multimetric_scoring`.
|details-start|
**Example of permutation feature importance using multiple scorers**
|details-split|

In the example below we use a list of metrics, but more input formats are
possible, as documented in :ref:`multimetric_scoring`.

>>> scoring = ['r2', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error']
>>> r_multi = permutation_importance(
Expand Down Expand Up @@ -116,7 +149,9 @@ The ranking of the features is approximately the same for different metrics even
if the scales of the importance values are very different. However, this is not
guaranteed and different metrics might lead to significantly different feature
importances, in particular for models trained for imbalanced classification problems,
for which the choice of the classification metric can be critical.
for which **the choice of the classification metric can be critical**.

|details-end|

Outline of the permutation importance algorithm
-----------------------------------------------
Expand Down Expand Up @@ -156,9 +191,9 @@ over low cardinality features such as binary features or categorical variables
with a small number of possible categories.

Permutation-based feature importances do not exhibit such a bias. Additionally,
the permutation feature importance may be computed performance metric on the
model predictions and can be used to analyze any model class (not
just tree-based models).
the permutation feature importance may be computed with any performance metric
on the model predictions and can be used to analyze any model class (not just
tree-based models).

The following example highlights the limitations of impurity-based feature
importance in contrast to permutation-based feature importance:
Expand All @@ -168,13 +203,29 @@ Misleading values on strongly correlated features
-------------------------------------------------

When two features are correlated and one of the features is permuted, the model
will still have access to the feature through its correlated feature. This will
result in a lower importance value for both features, where they might
*actually* be important.
still has access to the latter through its correlated feature. This results in a
lower reported importance value for both features, though they might *actually*
be important.

The figure below shows the permutation feature importance of a
:class:`~sklearn.ensemble.RandomForestClassifier` trained using the
:ref:`breast_cancer_dataset`, which contains strongly correlated features. A
naive interpretation would suggest that all features are unimportant:

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_multicollinear_002.png
:target: ../auto_examples/inspection/plot_permutation_importance_multicollinear.html
:align: center
:scale: 70

One way to handle the issue is to cluster features that are correlated and only
keep one feature from each cluster.

.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_multicollinear_004.png
:target: ../auto_examples/inspection/plot_permutation_importance_multicollinear.html
:align: center
:scale: 70

One way to handle this is to cluster features that are correlated and only
keep one feature from each cluster. This strategy is explored in the following
example:
For more details on such strategy, see the example
:ref:`sphx_glr_auto_examples_inspection_plot_permutation_importance_multicollinear.py`.

.. topic:: Examples:
Expand Down
4 changes: 2 additions & 2 deletions examples/inspection/plot_permutation_importance.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@
2001. <10.1023/A:1010933404324>`
"""
# %%
import numpy as np

# %%
# Data Loading and Feature Engineering
Expand All @@ -40,6 +38,8 @@
# values as records).
# - ``random_cat`` is a low cardinality categorical variable (3 possible
# values).
import numpy as np

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

Expand Down

0 comments on commit b06f444

Please sign in to comment.