Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent field_value_factor from calculating negative scores #41509

Closed
dschneiter opened this issue Apr 24, 2019 · 5 comments
Closed

Prevent field_value_factor from calculating negative scores #41509

dschneiter opened this issue Apr 24, 2019 · 5 comments
Labels
>docs General docs changes :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@dschneiter
Copy link
Contributor

dschneiter commented Apr 24, 2019

Not 100% sure this is a bug report (or whether it should be a change request).
This issue is related to #33309 and #35709

Elasticsearch version (bin/elasticsearch --version):
Version: 7.0.0, Build: default/tar/b7e28a7/2019-04-05T22:55:32.697037Z, JVM: 1.8.0_152

Plugins installed:

  • analysis-icu 7.0.0
  • analysis-kuromoji 7.0.0
  • analysis-phonetic 7.0.0

JVM version (java -version):
java version "1.8.0_152"
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

OS version (uname -a if on a Unix-like system):
Darwin Daniels-MacBook-Pro-Elastic.local 17.7.0 Darwin Kernel Version 17.7.0: Wed Feb 27 00:43:23 PST 2019; root:xnu-4570.71.35~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:
Using the "wrong" modifier in a field_value_factor currently can result in a negative factor which would result in a negative score which again gets prevented by raising exception field value function must not produce negative scores. As a consequence, such a query won't return any results.
Wouldn't it be more reasonable to prevent field_value_factor from returning negative values by returning 0 (=max(0, modified score)) in cases like the modifier function computing a negative value (example ln(0.8))?
Or as an alternative, no longer support now potentially "dangerous" modifier functions such as ln?

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Step 1
POST test/_doc/1
{
"title": "Test document",
"popularity": 0.8
}

Step 2
GET test/_search
{
"query": {
"function_score": {
"query": {
"match": {
"title": "test"
}
},
"functions": [
{
"field_value_factor": {
"field": "popularity",
"missing": 1,
"modifier": "ln"
}
}
],
"boost_mode": "sum"
}
}

Provide logs (if relevant):

@polyfractal polyfractal added the :Search Relevance/Ranking Scoring, rescoring, rank evaluation. label Apr 24, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jasontedor
Copy link
Member

Wouldn't it be more reasonable to prevent field_value_factor from returning negative values by returning 0 (=max(0, modified score)) in cases like the modifier function computing a negative value (example ln(0.8))?

Yes it would be possible, but it's not consistent with our engineering philosophies. We prefer to be upfront with user when an expectation is violated, rather than silently swallowing these situations behind the user. For example, silently truncating to zero could actually be hiding a bug in the user's intended scoring.

For example, rather than use x -> ln x a common approach would be to use x -> ln(1 + x).

@gwbrown
Copy link
Contributor

gwbrown commented Apr 25, 2019

We discussed this in FixItThursday and realized that we already have alternative functions to add to the field value (ln1p and ln2p to add 1 and 2 respectively) before taking the logarithm, and similar alternatives for the base-10 logarithm. Using these instead of just ln or log should avoid the issue of returning negative scores.

Because disallowing ln/log would be a breaking change (and thus not help in the 7.x series), and we don't want to start silently clamping negative values to 0 (as this would cause behavior to change in potentially unexpected ways), we've decided to look into two options:

  1. Add a note to the docs for the ln and log functions that the may return negative values and that ln1p, etc. should generally be preferred. (@gwbrown)
  2. Improve the error message in this case to recommend the use of those functions. (@jdconrad)

@gwbrown gwbrown removed the discuss label Apr 25, 2019
jdconrad added a commit that referenced this issue May 2, 2019
This changes the error message for a negative result in a function score when 
using the ln modifier to suggest using ln1p or ln2p when a negative result 
occurs in a function score and for the log modifier to suggest using log1p or 
log2p.

This relates to #41509
jdconrad added a commit that referenced this issue May 2, 2019
This changes the error message for a negative result in a function score when 
using the ln modifier to suggest using ln1p or ln2p when a negative result 
occurs in a function score and for the log modifier to suggest using log1p or 
log2p.

This relates to #41509
@jtibshirani jtibshirani added the >docs General docs changes label May 6, 2019
@jtibshirani
Copy link
Contributor

Closing this out now that the two points in @gwbrown's comment have been addressed (#41609 and #41610).

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
This changes the error message for a negative result in a function score when 
using the ln modifier to suggest using ln1p or ln2p when a negative result 
occurs in a function score and for the log modifier to suggest using log1p or 
log2p.

This relates to elastic#41509
@eshaan7
Copy link

eshaan7 commented Dec 9, 2020

For anyone stumbling on this issue now; it might be helpful to know that:

In my case, log2p didn't work either because there were many docs in which some value < -2 got indexed. This caused the search of our app to fail in production so as a quick fix we changed all modifiers to use square (i.e. even if there are negative values, they get squared into positive)

I'd really prefer to have this pointed out in the documentation as a hint or tip or something similar.

Using these instead of just ln or log should avoid the issue of returning negative scores.

@gwbrown a new modifier like abslog might be helpful which first takes the absolute value of the field and then applies logarithm, mathematically: log(abs(x))

@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

8 participants