Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add randomScore function in script_score query #40186

Merged

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Mar 19, 2019

We would like to deprecate function_score query for script_score.
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.

This function should be able to produce different
random scores on different index shards.
It also needs to be able to produce random scores
based on the internal Lucene Document Ids.
To achieve this, three variables have been added to
the score script context:

  • _doc for the internal Lucene doc id
  • _shard for the shard id
  • _index for the index name

Closes #31461

To make script_score query to have the same features
as function_score query, we need to add randomScore
function.

This function should be able to produce different
random scores on different index shards.
It also needs to be able to produce random scores
based on the internal Lucene Document Ids.
To achieve this three variables have been added to
the score script context:
- _doc for the internal Lucene doc id
- _shard for the shard id
- _indexName for the index name

Closes elastic#31461
@mayya-sharipova mayya-sharipova added the :Search Relevance/Ranking Scoring, rescoring, rank evaluation. label Mar 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@mayya-sharipova
Copy link
Contributor Author

I would like an opinion from @rjernst or @jdconrad if you are ok with adding _doc, _shard and _index variables to the ScoreScript?

@jdconrad
Copy link
Contributor

Hey @mayya-sharipova! @rjernst and I were just discussing how it's not great for the user to have to enter something like shard_id and doc_id into this method. I believe we can make this easier for you and our users with a modified version of ClassBinding, but need a bit more time to think about it. I realize this blocking some other things, so I will focus on an answer for you by tomorrow.

@mayya-sharipova
Copy link
Contributor Author

@jdconrad Thanks Jack, any other way to access share_id or doc_id through ClassBinding or any other way will also work for us. Looking forward to learning more from you.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova Thanks for using class bindings here! I'm hopeful this will be easier on the user for random scores. I'm happy to have things (doc id, shard id, etc.) be added to the base script in this case since the information is readily available as part of the script already -- it definitely makes sense to me to do this. I left a few comments and questions.

@rjernst Would you please also take a look at this PR?

}

public double randomNotReproducible() {
return rnd.nextDouble();
public double randomScore(String seedValue, int seed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my clarification the seedValue here can change between documents? Or rather is even expected to? Is the seed also expected to change between documents or would it make more sense to consider the seed as read only for a script and have it be part of the constructor?

Also, now that this class binding has direct access to the ScoreScript reference, would it make more sense to have the user just pass in the field name for the seedValue and resolve it from the doc within this class to prevent the user from passing in an arbitrary String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdconrad Right, seedValue is expected to be changed between documents, so different scores are produced for different documents. And you are right about seed value, it would make more sense to have it as a part of the constructor. I will modify the code accordingly.

would . it make more sense to have the user just pass in the field name for the seedValue and resolve it from the doc within this class to prevent the user from passing in an arbitrary String?

Good point! I will experiment with this.

public RandomScoreDoc(ScoreScript scoreScript) {
this.scoreScript = scoreScript;
}
public double randomScore(int seed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as the seed above? Should this be considered read-only?

/**
* @return the internal document ID
*/
public int getDocId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should rename these methods to something like _getDocId(). This way the user cannot access them directly through a script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdconrad Thanks Jack, I will rename these method

}
}

public void setShard(int shardId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we probably don't want the user setting the shard id in a script so maybe we can rename this to _setShard? @rjernst What do you think here?

/**
* @return shard id or throws an exception if shard is not set up for this script instance
*/
public int getShardId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

/**
* @return index name or throws an exception if the index name is not set up for this script instance
*/
public String getIndex() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

this.shardId = shardId;
}

public void setIndexName(String indexName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run CLA

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Mar 25, 2019

@jdconrad Thanks for the review, Jack. I have addressed you feedback in the subsequent commit. This is ready for another round of review.

One thing I have noticed that ScriptDocValues::get(index) produces wrong results in a document is missing field. We need to make ScriptDocValues::get(index) the same as ScriptDocValues::getValue(). I can make this change in another PR.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding this!

@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@mayya-sharipova
Copy link
Contributor Author

@rjernst Ryan, would like to review this PR as well? Or I can merge it?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit. I actually not like having the underscore methods, but they are necessary for now until some refactoring can be done in painless to not automatically expose all getters on the script classes.

@@ -53,26 +51,49 @@ public static double sigmoid(double value, double k, double a){
return Math.pow(value,a) / (Math.pow(k,a) + Math.pow(value,a));
}

public static final class RandomScore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment here like the one on RandomScoreDoc briefly describing how this differs? Maybe call it RandomScoreField?

@mayya-sharipova mayya-sharipova merged commit 459715d into elastic:master Mar 28, 2019
@mayya-sharipova mayya-sharipova deleted the script_score_random_score branch March 28, 2019 15:29
mayya-sharipova added a commit that referenced this pull request Mar 28, 2019
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.

This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.
@mayya-sharipova mayya-sharipova mentioned this pull request Mar 28, 2019
9 tasks
jasontedor added a commit to rjernst/elasticsearch that referenced this pull request Mar 28, 2019
* master: (25 commits)
  [DOCS] Correct keystore commands for Email and Jira actions in Watcher (elastic#40417)
  [DOCS] Document common settings for snapshot repository plugins (elastic#40475)
  Remove with(out)-system-key tests (elastic#40547)
  Geo Point parse error fix (elastic#40447)
  Handle null retention leases in WaitForNoFollowersStep (elastic#40477)
  [DOCS] Adds anchors for ruby client (elastic#39867)
  Mute DataFrameAuditorIT#testAuditorWritesAudits
  Disable integTest when Docker is not available (elastic#40585)
  Add randomScore function in script_score query (elastic#40186)
  Test fixtures krb5 (elastic#40297)
  Correct ILM metadata minimum compatibility version (elastic#40569)
  SQL: Centralize SQL test dependencies version handling (elastic#40551)
  Mute testTracerLog
  Mute testHttpInput
  Include functions' aliases in the list of functions (elastic#40584)
  Optimise rejection of out-of-range `long` values (elastic#40325)
  Add docs for cluster.remote.*.proxy setting (elastic#40281)
  Migrate systemd packaging tests from bats to java (elastic#39954)
  Move top-level pipeline aggs out of QuerySearchResult (elastic#40319)
  Use openjdk 12 in packer cache script (elastic#40498)
  ...
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request Apr 1, 2019
With this commit we align our benchmarks with
elastic/elasticsearch#40186 which has removed the function
`randomReproducible` and added a new function `randomScore`.

Relates elastic/elasticsearch#40186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new Script Score Query
5 participants