Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure cross_fields always uses valid term statistics #90278

Merged
merged 7 commits into from
Sep 23, 2022

Conversation

jtibshirani
Copy link
Contributor

In #89016 we adjusted the cross_fields scoring formula to prevent negative
scores. This fix accidentally dropped another important fix that was added in
#41938. Specifically, we need to make sure to take the minimum between the
document frequency (actualDf) and the minimum total term frequency
(minTTF). Otherwise, we can produce invalid term statistics where the total
term frequency is less than the document frequency.

This PR adds two safeguards to prevent this sort of regression in the future:

  • Adds a new comment explaining the adjustment, and marks it "IMPORTANT".
  • Adds a new test BlendedTermQueryTests#testRandomFields that adds a fairly
    large number of documents with random fields and values. I tried running this
    test 100 times, and checked it would have caught this issue.

Fixes #90275

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added v8.6.0 Team:Search Meta label for search team labels Sep 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @jtibshirani, I've created a changelog YAML for you.

@jtibshirani
Copy link
Contributor Author

I marked this as "non-issue" since it is unreleased on 8.5 and 7.17. However it's a regression in 8.4.2. When I backport I'll make sure to mark it "regression" and add a changes entry.

@jtibshirani
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jtibshirani jtibshirani merged commit 1acc278 into elastic:main Sep 23, 2022
@jtibshirani jtibshirani deleted the cross-fields branch September 23, 2022 17:35
jtibshirani added a commit that referenced this pull request Sep 23, 2022
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative
scores. This fix accidentally dropped another important fix that was added in
#41938. Specifically, we need to make sure to take the minimum between the
document frequency (`actualDf`) and the minimum total term frequency
(`minTTF`). Otherwise, we can produce invalid term statistics where the total
term frequency is less than the document frequency.

Fixes #90275
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Sep 23, 2022
In elastic#89016 we adjusted the `cross_fields` scoring formula to prevent negative
scores. This fix accidentally dropped another important fix that was added in
elastic#41938. Specifically, we need to make sure to take the minimum between the
document frequency (`actualDf`) and the minimum total term frequency
(`minTTF`). Otherwise, we can produce invalid term statistics where the total
term frequency is less than the document frequency.

Fixes elastic#90275
jtibshirani added a commit that referenced this pull request Sep 23, 2022
In #89016 we adjusted the `cross_fields` scoring formula to prevent negative
scores. This fix accidentally dropped another important fix that was added in
#41938. Specifically, we need to make sure to take the minimum between the
document frequency (`actualDf`) and the minimum total term frequency
(`minTTF`). Otherwise, we can produce invalid term statistics where the total
term frequency is less than the document frequency.

Fixes #90275
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Meta label for search team v7.17.7 v8.4.3 v8.5.0 v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

“totalTermFreq must be at least docFreq” error after upgrading to 8.4.2
3 participants