Expose Lucene's FeatureField. #30618

jpountz · 2018-05-15T13:52:02Z

Lucene has a new FeatureField which gives the ability to record numeric
features as term frequencies. Its main benefit is that it allows to boost
queries with the values of these features and efficiently skip non-competitive
documents at the same time using block-max WAND and indexed impacts.

Lucene has a new `FeatureField` which gives the ability to record numeric features as term frequencies. Its main benefit is that it allows to boost queries with the values of these features and efficiently skip non-competitive documents at the same time using block-max WAND and indexed impacts.

elasticmachine · 2018-05-15T13:52:04Z

Pinging @elastic/es-search-aggs

mayya-sharipova · 2018-05-16T12:25:45Z

@jpountz Thanks Adrien, this is a very interesting and necessary feature. Excited to have in elasticsearch!

I am wondering if there an intention to index multiple values for a feature. With you current PR, if I try to index multiple values:

{ "index" : { "_index" : "findex", "_type" : "_doc", "_id" : "2" } }
{  "text" : "newspaper", "pagerank": [100.0, 200] }

I am getting the following in the explanation of query score (looks like multiple values got converted to the max float value):

"_explanation": {
	"value": 88.72284,
	"description": "Log function on the _feature field for the pagerank feature, computed as w * log(a + S) from:",
	"details": [
		{
			"value": 1.0,
			"description": "w, weight of this function",
			"details": []
		},
		{
			"value": 4.0,
			"description": "a, scaling factor",
			"details": []
		},
		{
			"value": 3.4028235E38,
			"description": "S, feature value",
			"details": []
		}
	]
}

jpountz · 2018-05-16T12:42:56Z

Thanks @mayya-sharipova, this is a very good observation, we should reject multi-valued fields explicitly!

jimczi

LGTM

jimczi · 2018-05-16T14:34:51Z

docs/reference/query-dsl/feature-query.asciidoc

+ways to modify the score, this query has the benefit of being able to
+efficiently skip non-competitive hits when
+<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
+spectacular.


mayya-sharipova · 2018-05-17T20:39:08Z

@jpountz Thanks. Just one thing I want to clarify for myself. What does this phrase mean in the explanation below? What is w/2 here?

"k, pivot feature value that would give a score contribution equal to w/2"

"_explanation": {
	"value": 0.13026857,
	"description": "Saturation function on the _feature field for the url_length feature, computed as w * S / (S + k) from:",
	"details": [
		{
			"value": 1.0,
			"description": "w, weight of this function",
			"details": []
		},
		{
			"value": 0.33333334,
			"description": "k, pivot feature value that would give a score contribution equal to w/2",
			"details": []
		},
		{
			"value": 0.049926758,
			"description": "S, feature value",
			"details": []
		}
	]
}

jpountz · 2018-05-17T20:42:34Z

Thanks for testing @mayya-sharipova. It should be S/2 indeed, w doesn't make sense.

jpountz · 2018-05-18T13:11:22Z

Actually I read too quickly, the current explanation is correct: the score is computed as w * S / (S + k). So when S is equal to k, this becomes w * k / (k + k) = w / 2. I suspect it might be a bit confusing due to the fact that on the Lucene side we give users an explicit way to configure the boost (w) because from a Lucene perspective that's another query wrapper to use. However I don't think we need to expose it to Elasticsearch users since all queries already support configuring a boost, so if you would need to boost the impact of the feature query by 2, you could just do:

{
  "query": {
    "feature": {
      "field": "pagerank",
      "boost": 2
    }
  }
}

mayya-sharipova · 2018-05-18T16:51:22Z

@jpountz Thanks for the detailed explanation

* master: [DOCS] Fixes typos in security settings Fix GeoShapeQueryBuilder serialization after backport [DOCS] Splits auditing.asciidoc into smaller files Reintroduce mandatory http pipelining support (#30820) Painless: Types Section Clean Up (#30283) Add support for indexed shape routing in geo_shape query (#30760) [test] java tests for archive packaging (#30734) Revert "Make http pipelining support mandatory (#30695)" (#30813) [DOCS] Fix more edit URLs in Stack Overview (#30704) Use correct cluster state version for node fault detection (#30810) Change serialization version of doc-value fields. [DOCS] Fixes broken link for native realm [DOCS] Clarified audit.index.client.hosts (#30797) [TEST] Don't expect acks when isolating nodes Add a `format` option to `docvalue_fields`. (#29639) Fixes UpdateSettingsRequestStreamableTests mutate bug Mustes {p0=snapshot.get_repository/10_basic/*} YAML test Revert "Mutes MachineLearningTests.testNoAttributes_givenSameAndMlEnabled" Only allow x-pack metadata if all nodes are ready (#30743) Mutes MachineLearningTests.testNoAttributes_givenSameAndMlEnabled Use original settings on full-cluster restart (#30780) Only ack cluster state updates successfully applied on all nodes (#30672) Expose Lucene's FeatureField. (#30618) Fix a grammatical error in the 'search types' documentation. Remove http pipelining from integration test case (#30788)

jpountz added >feature release highlight :Search Foundations/Mapping Index mappings, including merging and defining field types v7.0.0 labels May 15, 2018

jpountz added 3 commits May 15, 2018 16:45

iter

4252f87

iter

8badb42

iter

a6ab3ff

jpountz added 2 commits May 16, 2018 15:34

Reject multi-valued fields.

1e2e34f

Merge branch 'master' into feature_field

be6e27b

jimczi approved these changes May 16, 2018

View reviewed changes

mayya-sharipova approved these changes May 17, 2018

View reviewed changes

Merge branch 'master' into feature_field

e9eb9d8

jpountz added 2 commits May 22, 2018 14:56

Merge branch 'master' into feature_field

29029bd

Lucene upgrade.

1d0cd96

jpountz merged commit 886db84 into elastic:master May 23, 2018

jpountz deleted the feature_field branch May 23, 2018 06:55

jpountz mentioned this pull request May 29, 2018

Support DelimitedTermFrequencyTokenFilter #27552

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Lucene's FeatureField. #30618

Expose Lucene's FeatureField. #30618

jpountz commented May 15, 2018

elasticmachine commented May 15, 2018

mayya-sharipova commented May 16, 2018 •

edited

Loading

jpountz commented May 16, 2018

jimczi left a comment

jimczi May 16, 2018

mayya-sharipova commented May 17, 2018

jpountz commented May 17, 2018

jpountz commented May 18, 2018

mayya-sharipova commented May 18, 2018

Expose Lucene's FeatureField. #30618

Expose Lucene's FeatureField. #30618

Conversation

jpountz commented May 15, 2018

elasticmachine commented May 15, 2018

mayya-sharipova commented May 16, 2018 • edited Loading

jpountz commented May 16, 2018

jimczi left a comment

Choose a reason for hiding this comment

jimczi May 16, 2018

Choose a reason for hiding this comment

mayya-sharipova commented May 17, 2018

jpountz commented May 17, 2018

jpountz commented May 18, 2018

mayya-sharipova commented May 18, 2018

mayya-sharipova commented May 16, 2018 •

edited

Loading