Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to the Script context for a new Script Score Query #30303

Closed
14 tasks
mayya-sharipova opened this issue May 1, 2018 · 8 comments
Closed
14 tasks
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >feature

Comments

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented May 1, 2018

We are designing a new Script Score Query (SSQ) to replace Function Score Query (FSQ). The goal of SSQ is to have the same (and possibly more) functionalities as FSQ available only through painless script. For this, we would like to add the below functions to painless. They can be available either in the SearchScript or a specifically designed for scoring ScoringScript.

Random score

Similar to random_score in FSQ:

  • generates scores [0, 1]
  • by default, uses Lucene doc ids as a source of randomness (efficient, not reproducible). To make reproducible we need: seed, field (min value for this doc) and salt (function of index name and shard).
"script" : {
  "source" : "random_score(params.seed, doc['field'])",
  "params": {"seed": 10}
}

Currently painless allows to generate random values in the way below, but it is bulky, and not the exact reproduction of random score in FSQ:

"script" : {
  "source" : "Random rnd = new Random(); rnd.setSeed(doc['field'].value); rnd.nextFloat()"
}

Math functions

We would like to introduce a shorter version of the following functions useful for score calculations:

  • log: Math.log10(doc['f'].value) -> log(doc['f'].value)
  • log1p: Math.log10(doc['f'].value + 1) ->log1p(doc['field'].value)
  • log2p: Math.log10(doc['f'].value + 2) -> log2p(doc['f'].value)
  • ln: Math.log(doc['f'].value) -> ln(doc['f'].value)
  • ln1p: Math.log1p(doc['f'].value + 1) -> ln1p(doc['f'].value)
  • ln2p: Math.log(doc['f'].value + 2) -> ln2p(doc['f'].value)
  • square: Math.pow(doc['f'].value, 2) -> square(doc['f'].value)
  • sqrt: Math.sqrt(doc['f'].value) -> sqrt(doc['f'].value)
  • reciprocal 1/value :1.0 / doc['f'].value -> reciprocal(doc['f'].value)
  • rational value/(k + value) :doc['f'].value / (k + doc['f'].value) -> rational(doc['f'].value, k)
  • sigmoid valuea/ (ka + valuea): Math.pow(doc['f'].value,a) / (Math.pow(k,a) + Math.pow(doc['f'].value,a)) -> sigmoid(doc['f'].value, k, a)

Decay functions

Similar to decay functions in FSQ:

  • decay_gauss
  • decay_exp
  • decay_linear

Proposed API:

"script" : {
  "source" : "decay_gauss(doc['date'], params.origin, params.scale, params.offset, params.decay)",
  "params": {
    "origin": "2013-09-17", 
    "scale": "10d", 
    "offset": "5d",
    "decay" : 0.5
  }
}
"script" : {
  "source" : "decay_linear(doc['geo'], params.origin, params.scale, params.offset, params.decay)",
  "params": {
    "origin": "11, 12", 
    "scale": "2km", 
    "offset": "0km",
    "decay" : 0.33
  }
}

Investigate how to parse date and geo parameters only one per query, and don't do the parsing for every document (store in context?).

Normalization functions ???

  • _max_score in the rescore context?
  • Similar to ScaleFloatFunction in Lucene, SOLR’s scale function: scale(x, minTarget, maxTarget): scale the values of x, such that all values will be between minTarget and maxTarget ?

Other functions ???

  • _index lucene terms stats (doc count, doc frequency, tf, total term frequency), e.g. _index[‘text’][‘word’].tf()
  • Index wide statistics similar to DFS_QUERY_THEN_FETCH
  • payloads
  • matches (see IntervalQuery)
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented May 1, 2018

Would like to get feedback from @rjernst @jdconrad
cc @polyfractal

@mayya-sharipova mayya-sharipova added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label May 1, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@nik9000
Copy link
Member

nik9000 commented May 1, 2018

Random score

Painless is one of the few places where we can't use Randomness right now. It might be worth looking at when we do this.

@jdconrad
Copy link
Contributor

jdconrad commented May 1, 2018

I'm for adding a new context for this -- ScoringScript is good. For now, the best way to add these methods is to add them as static methods to the ScoringScript class and whitelist them. I will work towards adding a way to have methods be called without the static type qualifier, but that will take me a bit of a time.

@rjernst
Copy link
Member

rjernst commented May 1, 2018

_index lucene terms stats (doc count, doc frequency, tf, total term frequency), e.g. _index[‘text’][‘word’].tf()

We removed these from scripting in 6.0, as they are for advanced users. I don't think we should add them back. Anyone wanting to do this should write a custom script engine (and thus have access to all of the Lucene API).

@mayya-sharipova
Copy link
Contributor Author

Painless is one of the few places where we can't use Randomness right now. It might be worth looking at when we do this.

@nik9000 Can you please elaborate more on why we can't use Randomness in painless?

@rjernst
Copy link
Member

rjernst commented May 17, 2018

@mayya-sharipova Randomness would be the way to implement the random_score method which does not take a seed, but for the ones taking a seed, Random should still be used directly. But I don't think we should expose Randomness in painless directly.

@mayya-sharipova
Copy link
Contributor Author

Closed with #34533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >feature
Projects
None yet
Development

No branches or pull requests

5 participants