[ML] adds multi-class feature importance support #53803

benwtrent · 2020-03-19T14:42:06Z

Adds multi-class feature importance calculation.

Feature importance objects are now mapped as follows
(logistic) Regression:

{
   "feature_name": "feature_0",
   "importance": -1.3
}

Multi-class [class names are foo, bar, baz]

{ 
   “feature_name”: “feature_0”, 
   “importance”: 2.0, // abs(sum()) of class importances
   “foo”: 1.0, 
   “bar”: 0.5, 
   “baz”: -0.5 
},

For users to get the full benefit of aggregating and searching for feature importance, they should update their index mapping as follows (before turning this option on in their pipelines)

 "ml.inference.feature_importance": {
          "type": "nested",
          "dynamic": true,
          "properties": {
            "feature_name": {
              "type": "keyword"
            },
            "importance": {
              "type": "double"
            }
          }
        }

The mapping field name is as follows
ml.<inference.target_field>.<inference.tag>.feature_importance
if inference.tag is not provided in the processor definition, it is not part of the field path.
inference.target_field is defaulted to ml.inference.
//cc @lcawl ^ Where should we document this?

If this makes it in for 7.7, there shouldn't be any feature_importance at inference BWC worries as 7.7 is the first version to have it.

…e support

elasticmachine · 2020-03-19T14:42:08Z

Pinging @elastic/ml-core (:ml)

benwtrent · 2020-03-19T16:31:03Z

run elasticsearch-ci/packaging-sample-matrix-unix

benwtrent · 2020-03-19T18:42:34Z

.../core/src/main/java/org/elasticsearch/xpack/core/ml/inference/results/FeatureImportance.java

+    }
+
+    public static FeatureImportance forClassification(String featureName, Map<String, Double> classImportance) {
+        return new FeatureImportance(featureName, classImportance.values().stream().mapToDouble(Math::abs).sum(), classImportance);


I am not 100% convinced this should be abs.

We don't write the feature importance value on the native side by looking at the norm of the vector.

Do we want to make this the norm too? Or do we thing abs is good enough?

@tveasey @valeriy42

Can you please provide more context. What are you calculating here?

@valeriy42 @tveasey this is calculating the "overall importance" of all the classes combined for a given feature. This is so we can measure "most important feature" independent of the classes.

norm would make it an L2 norm, abs makes it an L1 norm. Either way is suitable. I think, abs is better, since norm over-treats larger importances and ignores smaller once.

Feature importance is already calculated for multi-class models. This commit adjusts the output sent to ES so that multi-class importance can be explored. Feature importance objects are now mapped as follows (logistic) Regression: ``` { "feature_name": "feature_0", "importance": -1.3 } ``` Multi-class [class names are `foo`, `bar`, `baz`] ``` { “feature_name”: “feature_0”, “importance”: 2.0, // sum(abs()) of class importances “foo”: 1.0, “bar”: 0.5, “baz”: -0.5 }, ``` Java side change: elastic/elasticsearch#53803

davidkyle

Some questions but LGTM.

Quite a lot of code really

davidkyle · 2020-03-23T14:52:51Z

...main/java/org/elasticsearch/xpack/core/ml/inference/results/SingleValueInferenceResults.java

+            return unsortedFeatureImportances;
+        }
+        return unsortedFeatureImportances.stream()
+            .sorted((l, r)-> Double.compare(Math.abs(r.getImportance()), Math.abs(l.getImportance())))


Is the abs necessary when the score is a norm? If the score can be -ve why is it wrong to use the -ve value?

@davidkyle

Score is not absolutely the norm. Additionally, we want to have the MOST influential values, regardless of direction. We could have feature importances like this:

{ A: -1.2, B: -0.2, C: 0.5 }

If we want the top two influential features, we want A and C.

The getImportance is only the norm when it comes to multi-class. This is not the case for (logistic) regression.

...e/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/InferenceHelpers.java

davidkyle · 2020-03-23T15:38:23Z

.../core/src/main/java/org/elasticsearch/xpack/core/ml/inference/results/FeatureImportance.java

+        this.featureName = in.readString();
+        this.importance = in.readDouble();
+        if (in.readBoolean()) {
+            this.classImportance = in.readLinkedHashMap(StreamInput::readString, StreamInput::readDouble);


I'm not sure why this has to be a linked hash map? I'm assuming to preserve insertion order but why? If this was ever serialisable to xcontent the ordering could not be guaranteed

ToXContent does not factor here. We are concerned about the order when the values are written to the ingest document.

FWIW, this sort of thing is already done with Object maps. Just cannot do it with specific stream inputs.

Thinking about it more and looking more into the ingest doc code. I agree with you. This seems superfluous for now. If ordering becomes a concern for usability, we can add it in the future. The reading from the wire for both LinkedHashMap and HashMap would be exactly the same, so BWC is not a concern.

) Feature importance is already calculated for multi-class models. This commit adjusts the output sent to ES so that multi-class importance can be explored. Feature importance objects are now mapped as follows (logistic) Regression: ``` { "feature_name": "feature_0", "importance": -1.3 } ``` Multi-class [class names are `foo`, `bar`, `baz`] ``` { “feature_name”: “feature_0”, “importance”: 2.0, // sum(abs()) of class importances “foo”: 1.0, “bar”: 0.5, “baz”: -0.5 }, ``` Java side change: elastic/elasticsearch#53803

…mportance

… github.com:benwtrent/elasticsearch into feature/ml-inference-multi-class-feature-importance

benwtrent · 2020-03-23T19:51:21Z

@elasticmachine update branch

…mportance

@lcawl

Adds multi-class feature importance calculation. Feature importance objects are now mapped as follows (logistic) Regression: ``` { "feature_name": "feature_0", "importance": -1.3 } ``` Multi-class [class names are `foo`, `bar`, `baz`] ``` { “feature_name”: “feature_0”, “importance”: 2.0, // sum(abs()) of class importances “foo”: 1.0, “bar”: 0.5, “baz”: -0.5 }, ``` For users to get the full benefit of aggregating and searching for feature importance, they should update their index mapping as follows (before turning this option on in their pipelines) ``` "ml.inference.feature_importance": { "type": "nested", "dynamic": true, "properties": { "feature_name": { "type": "keyword" }, "importance": { "type": "double" } } } ``` The mapping field name is as follows `ml.<inference.target_field>.<inference.tag>.feature_importance` if `inference.tag` is not provided in the processor definition, it is not part of the field path. `inference.target_field` is defaulted to `ml.inference`. //cc @lcawl ^ Where should we document this? If this makes it in for 7.7, there shouldn't be any feature_importance at inference BWC worries as 7.7 is the first version to have it.

@lcawl

Adds multi-class feature importance calculation. Feature importance objects are now mapped as follows (logistic) Regression: ``` { "feature_name": "feature_0", "importance": -1.3 } ``` Multi-class [class names are `foo`, `bar`, `baz`] ``` { “feature_name”: “feature_0”, “importance”: 2.0, // sum(abs()) of class importances “foo”: 1.0, “bar”: 0.5, “baz”: -0.5 }, ``` For users to get the full benefit of aggregating and searching for feature importance, they should update their index mapping as follows (before turning this option on in their pipelines) ``` "ml.inference.feature_importance": { "type": "nested", "dynamic": true, "properties": { "feature_name": { "type": "keyword" }, "importance": { "type": "double" } } } ``` The mapping field name is as follows `ml.<inference.target_field>.<inference.tag>.feature_importance` if `inference.tag` is not provided in the processor definition, it is not part of the field path. `inference.target_field` is defaulted to `ml.inference`. //cc @lcawl ^ Where should we document this? If this makes it in for 7.7, there shouldn't be any feature_importance at inference BWC worries as 7.7 is the first version to have it.

[ML] adjusts feature importance format and adds multi-class importanc…

15ae0c6

…e support

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.7.0 labels Mar 19, 2020

benwtrent commented Mar 19, 2020

View reviewed changes

benwtrent mentioned this pull request Mar 19, 2020

[ML] calculate feature importance for multi-class results elastic/ml-cpp#1071

Merged

davidkyle approved these changes Mar 23, 2020

View reviewed changes

Update InferenceHelpers.java

5fd61c2

benwtrent added 3 commits March 23, 2020 15:32

Merge branch 'master' into feature/ml-inference-multi-class-feature-i…

2632205

…mportance

removing the read in linked hashmap

2adfbaf

Merge branch 'feature/ml-inference-multi-class-feature-importance' of…

47b4225

… github.com:benwtrent/elasticsearch into feature/ml-inference-multi-class-feature-importance

Merge branch 'master' into feature/ml-inference-multi-class-feature-i…

fcf853a

…mportance

benwtrent merged commit 756a297 into elastic:master Mar 23, 2020

benwtrent deleted the feature/ml-inference-multi-class-feature-importance branch March 23, 2020 20:53

benwtrent mentioned this pull request Mar 23, 2020

[7.x] [ML] adds multi-class feature importance support (#53803) #54024

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] adds multi-class feature importance support #53803

[ML] adds multi-class feature importance support #53803

benwtrent commented Mar 19, 2020

elasticmachine commented Mar 19, 2020

benwtrent commented Mar 19, 2020

benwtrent Mar 19, 2020

valeriy42 Mar 20, 2020

benwtrent Mar 20, 2020

valeriy42 Mar 20, 2020

tveasey Mar 23, 2020

davidkyle left a comment

davidkyle Mar 23, 2020

benwtrent Mar 23, 2020 •

edited

Loading

davidkyle Mar 23, 2020

benwtrent Mar 23, 2020 •

edited

Loading

benwtrent Mar 23, 2020

benwtrent Mar 23, 2020

benwtrent commented Mar 23, 2020

[ML] adds multi-class feature importance support #53803

[ML] adds multi-class feature importance support #53803

Conversation

benwtrent commented Mar 19, 2020

elasticmachine commented Mar 19, 2020

benwtrent commented Mar 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Mar 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent Mar 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent commented Mar 23, 2020

benwtrent Mar 23, 2020 •

edited

Loading

benwtrent Mar 23, 2020 •

edited

Loading