-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure cross_fields always uses valid term statistics #90278
Conversation
Pinging @elastic/es-search (Team:Search) |
Hi @jtibshirani, I've created a changelog YAML for you. |
I marked this as "non-issue" since it is unreleased on 8.5 and 7.17. However it's a regression in 8.4.2. When I backport I'll make sure to mark it "regression" and add a changes entry. |
@elasticmachine run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in #41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes #90275
In elastic#89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in elastic#41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes elastic#90275
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative scores. This fix accidentally dropped another important fix that was added in #41938. Specifically, we need to make sure to take the minimum between the document frequency (`actualDf`) and the minimum total term frequency (`minTTF`). Otherwise, we can produce invalid term statistics where the total term frequency is less than the document frequency. Fixes #90275
In #89016 we adjusted the
cross_fields
scoring formula to prevent negativescores. This fix accidentally dropped another important fix that was added in
#41938. Specifically, we need to make sure to take the minimum between the
document frequency (
actualDf
) and the minimum total term frequency(
minTTF
). Otherwise, we can produce invalid term statistics where the totalterm frequency is less than the document frequency.
This PR adds two safeguards to prevent this sort of regression in the future:
BlendedTermQueryTests#testRandomFields
that adds a fairlylarge number of documents with random fields and values. I tried running this
test 100 times, and checked it would have caught this issue.
Fixes #90275