[ML] Regression Rescorer #52059

davidkyle · 2020-02-07T16:13:38Z

Adds a rescorer utilising the regression models currently used in the Inference Processor, in fact the configuration shares many similarities.

GET interesting_index/_search
{
  "query": { ...  },
  "rescore": {
    "ml_rescore": {
      "model_id": "mustard",
      "field_mappings": {},
      "inference_config": { "regression": {} }
    },
  }
}

The rescorer loads the specified model (model_id) during rewrite then for each document extracts the fields required by the model and rescores the hit according to the regression result. The final score is computed from the search score and model score using the same score_mode operations as query rescore.

GET interesting_index/_search
{
  "query": { ...  },
  "rescore": {
    "ml_rescore": {
      "model_id": "mustard",
      "field_mappings": {},
      "inference_config": { "regression": {} },
      "model_weight": 2.0,
      "query_weight": 1.0,
      "score_mode": "total" 
    },
  }
}

As with all prototypes most of the work is left todo:

It's assumed the fields doc value doubles. This may not be the case if preprocessors are used
Loaded models are not cached
Validation and error checking is largely missing
Implement the explain API using feature importance
documentation

elasticmachine · 2020-02-07T16:13:40Z

Pinging @elastic/ml-core (:ml)

davidkyle · 2020-02-10T09:20:51Z

retest this please

benwtrent

I like the simple implementation. Seems like a natural place to put inference support :)

benwtrent · 2020-02-12T14:36:14Z

...n/ml/src/main/java/org/elasticsearch/xpack/ml/inference/search/InferenceRescorerBuilder.java

+                modelProvider.getTrainedModel(modelId, true, ActionListener.wrap(trainedModel -> {
+                    LocalModel model = new LocalModel(
+                        modelId,
+                        trainedModel.ensureParsedDefinition(ctx.getXContentRegistry()).getModelDefinition(),


Does a re-scorer run only on the coordinating node or down on the shards?

If it is down on the shards, It might be good to not inflate the definition until it is used, that way the query definition is smaller for serializing across the wire.

I am not sure the computation cost of inflating it once and serializing it vs inflating it X number of times on each shard.

👍 The rescorer only runs at the shard. The coordinating node receives the scores from each shard and is responsible for choosing which documents to return but doesn't actually do any rescoring

Caching is another consideration. I am not sure if we can cache the inflated object on the shard so that we know not to inflate it and just use the cached object.

The cost of writing the inflated model on the wire might not be THAT bad if you take into consideration all nodes having to gunzip + parse the JSON every time.

joshdevins · 2020-02-12T16:05:39Z

Loaded models are not cached

As in, on each rescore (so once per shard per query), we load the model again?

davidkyle · 2020-02-12T16:35:50Z

As in, on each rescore (so once per shard per query), we load the model again?

Well there is always room for improvement. The caching framework used by the inference ingest processor can be utilised and should be fairly easy to incorporate.

benwtrent · 2020-02-12T19:10:27Z

As in, on each rescore (so once per shard per query), we load the model again?

It is loaded once per query. Not once per shard since @davidkyle is loading it up in the coordinating node.

joshdevins · 2020-02-18T08:57:47Z

It is loaded once per query. Not once per shard since @davidkyle is loading it up in the coordinating node.

Right, and then it's serialized and sent to each shard?

I'll wait to hear back from the Elasticsearch team about caching. As a PoC, this seems totally fine to me, and beyond that we can iterate on caching strategies.

davidkyle · 2020-02-27T18:10:48Z

run elasticsearch-ci/2

joshdevins · 2020-05-27T12:28:14Z

run elasticsearch-ci/2

davidkyle · 2020-06-02T10:31:12Z

run elasticsearch-ci/2

davidkyle added the :ml Machine learning label Feb 7, 2020

davidkyle force-pushed the rescore-inf branch from 331bd8f to fcc8b7e Compare February 10, 2020 10:30

benwtrent reviewed Feb 12, 2020

View reviewed changes

jtibshirani mentioned this pull request Mar 3, 2020

Pass in ModelLoadingService to the inference rescorer. jtibshirani/elasticsearch#8

Open

davidkyle force-pushed the rescore-inf branch from fcc8b7e to 9b69b3f Compare June 2, 2020 08:36

davidkyle added 14 commits June 2, 2020 09:53

Rescore

2ed40fe

WIP

a13e0a5

Something working

dc26255

YML tests for rescore

148e59b

Rescore Mode

a00bd75

Rework walking the leaves

b071af2

Clean up

92236f3

Apply spotless formatting

4e28694

Tidy up

dc83ac1

Use model loading service

47fbff8

Use the LocalModel interface instead of Model

65e33b7

fix tests

31178b1

Fix yml test

ec95823

Add EmptyConfigUpdate class

e7b1e95

davidkyle force-pushed the rescore-inf branch from 9b69b3f to e7b1e95 Compare June 2, 2020 08:53

Add bulk to list of rest specs

ddc5632

Blacklist rest test that doesn't throw security exception

af13b23

davidkyle closed this Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Regression Rescorer #52059

[ML] Regression Rescorer #52059

davidkyle commented Feb 7, 2020

elasticmachine commented Feb 7, 2020

davidkyle commented Feb 10, 2020

benwtrent left a comment

benwtrent Feb 12, 2020

benwtrent Feb 12, 2020

davidkyle Feb 12, 2020

benwtrent Feb 12, 2020

joshdevins commented Feb 12, 2020

davidkyle commented Feb 12, 2020

benwtrent commented Feb 12, 2020 •

edited

Loading

joshdevins commented Feb 18, 2020

davidkyle commented Feb 27, 2020

joshdevins commented May 27, 2020

davidkyle commented Jun 2, 2020

[ML] Regression Rescorer #52059

[ML] Regression Rescorer #52059

Conversation

davidkyle commented Feb 7, 2020

elasticmachine commented Feb 7, 2020

davidkyle commented Feb 10, 2020

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Feb 12, 2020

Choose a reason for hiding this comment

benwtrent Feb 12, 2020

Choose a reason for hiding this comment

davidkyle Feb 12, 2020

Choose a reason for hiding this comment

benwtrent Feb 12, 2020

Choose a reason for hiding this comment

joshdevins commented Feb 12, 2020

davidkyle commented Feb 12, 2020

benwtrent commented Feb 12, 2020 • edited Loading

joshdevins commented Feb 18, 2020

davidkyle commented Feb 27, 2020

joshdevins commented May 27, 2020

davidkyle commented Jun 2, 2020

benwtrent commented Feb 12, 2020 •

edited

Loading