Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid negative scores returned from multi_match query with cross_fields #13829

Merged
merged 4 commits into from
May 31, 2024

Commits on May 31, 2024

  1. Avoid negative scores returned from multi_match query with cross_fields

    Under specific circumstances, when using `cross_fields` scoring on a
    `multi_match` query, we can end up with negative scores from the inverse
    document frequency calculation in the BM25 formula.
    
    Specifically, the IDF is calculated as:
    
    ```
    log(1 + (N - n + 0.5) / (n + 0.5))
    ```
    
    where `N` is the number of documents containing the field and `n` is the
    number of documents containing the given term in the field. Obviously,
    `n` should always be less than or equal to `N`.
    
    Unfortunately, `cross_fields` makes up a new value for `n` and tries to
    use it across all fields.
    
    This change finds the minimum (nonzero) value of `N` and uses that as an
    upper bound for the new value of `n`.
    
    Signed-off-by: Michael Froh <froh@amazon.com>
    msfroh committed May 31, 2024
    Configuration menu
    Copy the full SHA
    4156b65 View commit details
    Browse the repository at this point in the history
  2. Move df safeguard to affected term(s) only

    Signed-off-by: Michael Froh <froh@amazon.com>
    msfroh committed May 31, 2024
    Configuration menu
    Copy the full SHA
    ce3227c View commit details
    Browse the repository at this point in the history
  3. Add basic type conversion for YAML tests

    Our yaml rest tests parse expected values based on basic yaml parsing
    rules, which don't draw a distinction between float and double values.
    
    If there's a mismatch between the types of the expected and actual
    values in an assertion, we should try parsing the actual value to the
    expected type. For now, I just added support for the 32- and 64-bit
    integer and floating point types.
    
    Signed-off-by: Michael Froh <froh@amazon.com>
    msfroh committed May 31, 2024
    Configuration menu
    Copy the full SHA
    fee9fab View commit details
    Browse the repository at this point in the history
  4. Skip integ test on pre-3.0 versions.

    Until we backport the fix.
    
    Signed-off-by: Michael Froh <froh@amazon.com>
    msfroh committed May 31, 2024
    Configuration menu
    Copy the full SHA
    6388d59 View commit details
    Browse the repository at this point in the history