DOC Improve docs of permutation importance on the user guide (#27154)

Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
scikit-learn · Jan 17, 2024 · b06f444 · b06f444
1 parent 45a5f6a
commit b06f444
Show file tree

Hide file tree

Showing 4 changed files with 80 additions and 29 deletions.
diff --git a/doc/images/permuted_non_predictive_feature.png b/doc/images/permuted_non_predictive_feature.png
diff --git a/doc/images/permuted_predictive_feature.png b/doc/images/permuted_predictive_feature.png
diff --git a/doc/modules/permutation_importance.rst b/doc/modules/permutation_importance.rst
@@ -6,15 +6,45 @@ Permutation feature importance
 
 .. currentmodule:: sklearn.inspection
 
-Permutation feature importance is a model inspection technique that can be used
-for any :term:`fitted` :term:`estimator` when the data is tabular. This is
-especially useful for non-linear or opaque :term:`estimators`. The permutation
-feature importance is defined to be the decrease in a model score when a single
-feature value is randomly shuffled [1]_. This procedure breaks the relationship
-between the feature and the target, thus the drop in the model score is
-indicative of how much the model depends on the feature. This technique
-benefits from being model agnostic and can be calculated many times with
-different permutations of the feature.
+Permutation feature importance is a model inspection technique that measures the
+contribution of each feature to a :term:`fitted` model's statistical performance
+on a given tabular dataset. This technique is particularly useful for non-linear
+or opaque :term:`estimators`, and involves randomly shuffling the values of a
+single feature and observing the resulting degradation of the model's score
+[1]_. By breaking the relationship between the feature and the target, we
+determine how much the model relies on such particular feature.
+
+In the following figures, we observe the effect of permuting features on the correlation
+between the feature and the target and consequently on the model statistical
+performance.
+
+.. image:: ../images/permuted_predictive_feature.png
+   :align: center
+
+.. image:: ../images/permuted_non_predictive_feature.png
+   :align: center
+
+On the top figure, we observe that permuting a predictive feature breaks the
+correlation between the feature and the target, and consequently the model
+statistical performance decreases. On the bottom figure, we observe that permuting
+a non-predictive feature does not significantly degrade the model statistical performance.
+
+One key advantage of permutation feature importance is that it is
+model-agnostic, i.e. it can be applied to any fitted estimator. Moreover, it can
+be calculated multiple times with different permutations of the feature, further
+providing a measure of the variance in the estimated feature importances for the
+specific trained model.
+
+The figure below shows the permutation feature importance of a
+:class:`~sklearn.ensemble.RandomForestClassifier` trained on an augmented
+version of the titanic dataset that contains a `random_cat` and a `random_num`
+features, i.e. a categrical and a numerical feature that are not correlated in
+any way with the target variable:
+
+.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_002.png
+   :target: ../auto_examples/inspection/plot_permutation_importance.html
+   :align: center
+   :scale: 70
 
 .. warning::
 
@@ -74,15 +104,18 @@ highlight which features contribute the most to the generalization power of the
 inspected model. Features that are important on the training set but not on the
 held-out set might cause the model to overfit.
 
-The permutation feature importance is the decrease in a model score when a single
-feature value is randomly shuffled. The score function to be used for the
-computation of importances can be specified with the `scoring` argument,
-which also accepts multiple scorers. Using multiple scorers is more computationally
-efficient than sequentially calling :func:`permutation_importance` several times
-with a different scorer, as it reuses model predictions.
+The permutation feature importance depends on the score function that is
+specified with the `scoring` argument. This argument accepts multiple scorers,
+which is more computationally efficient than sequentially calling
+:func:`permutation_importance` several times with a different scorer, as it
+reuses model predictions.
 
-An example of using multiple scorers is shown below, employing a list of metrics,
-but more input formats are possible, as documented in :ref:`multimetric_scoring`.
+|details-start|
+**Example of permutation feature importance using multiple scorers**
+|details-split|
+
+In the example below we use a list of metrics, but more input formats are
+possible, as documented in :ref:`multimetric_scoring`.
 
   >>> scoring = ['r2', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error']
   >>> r_multi = permutation_importance(
@@ -116,7 +149,9 @@ The ranking of the features is approximately the same for different metrics even
 if the scales of the importance values are very different. However, this is not
 guaranteed and different metrics might lead to significantly different feature
 importances, in particular for models trained for imbalanced classification problems,
-for which the choice of the classification metric can be critical.
+for which **the choice of the classification metric can be critical**.
+
+|details-end|
 
 Outline of the permutation importance algorithm
 -----------------------------------------------
@@ -156,9 +191,9 @@ over low cardinality features such as binary features or categorical variables
 with a small number of possible categories.
 
 Permutation-based feature importances do not exhibit such a bias. Additionally,
-the permutation feature importance may be computed performance metric on the
-model predictions and can be used to analyze any model class (not
-just tree-based models).
+the permutation feature importance may be computed with any performance metric
+on the model predictions and can be used to analyze any model class (not just
+tree-based models).
 
 The following example highlights the limitations of impurity-based feature
 importance in contrast to permutation-based feature importance:
@@ -168,13 +203,29 @@ Misleading values on strongly correlated features
 -------------------------------------------------
 
 When two features are correlated and one of the features is permuted, the model
-will still have access to the feature through its correlated feature. This will
-result in a lower importance value for both features, where they might
-*actually* be important.
+still has access to the latter through its correlated feature. This results in a
+lower reported importance value for both features, though they might *actually*
+be important.
+
+The figure below shows the permutation feature importance of a
+:class:`~sklearn.ensemble.RandomForestClassifier` trained using the
+:ref:`breast_cancer_dataset`, which contains strongly correlated features. A
+naive interpretation would suggest that all features are unimportant:
+
+.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_multicollinear_002.png
+   :target: ../auto_examples/inspection/plot_permutation_importance_multicollinear.html
+   :align: center
+   :scale: 70
+
+One way to handle the issue is to cluster features that are correlated and only
+keep one feature from each cluster.
+
+.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_permutation_importance_multicollinear_004.png
+   :target: ../auto_examples/inspection/plot_permutation_importance_multicollinear.html
+   :align: center
+   :scale: 70
 
-One way to handle this is to cluster features that are correlated and only
-keep one feature from each cluster. This strategy is explored in the following
-example:
+For more details on such strategy, see the example
 :ref:`sphx_glr_auto_examples_inspection_plot_permutation_importance_multicollinear.py`.
 
 .. topic:: Examples:

diff --git a/examples/inspection/plot_permutation_importance.py b/examples/inspection/plot_permutation_importance.py
@@ -24,8 +24,6 @@
      2001. <10.1023/A:1010933404324>`
 
 """
-# %%
-import numpy as np
 
 # %%
 # Data Loading and Feature Engineering
@@ -40,6 +38,8 @@
 #   values as records).
 # - ``random_cat`` is a low cardinality categorical variable (3 possible
 #   values).
+import numpy as np
+
 from sklearn.datasets import fetch_openml
 from sklearn.model_selection import train_test_split