[Backport 2.x] Fix negative scores returned from multi_match
query with cross_fields
#13983
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Manual backport of #13829
Under specific circumstances, when using
cross_fields
scoring on amulti_match
query, we can end up with negative scores from the inverse document frequency calculation in the BM25 formula.Specifically, the IDF is calculated as:
where
N
is the number of documents containing the field andn
is the number of documents containing the given term in the field. Obviously,n
should always be less than or equal toN
.Unfortunately,
cross_fields
makes up a new value forn
and tries to use it across all fields.This change finds the (nonzero) value of
N
for each field and uses that as an upper bound for the new value ofn
.Signed-off-by: Michael Froh froh@amazon.com
Signed-off-by: Michael Froh froh@amazon.com
(cherry picked from commit fffd101)
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
New functionality includes testing.All tests passNew functionality has been documented.New functionality has javadoc addedAPI changes companion pull request created.Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.