Skip to content

Commit

Permalink
Add randomScore function in script_score query
Browse files Browse the repository at this point in the history
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.

This function should be able to produce different
random scores on different index shards.
It also needs to be able to produce random scores
based on the internal Lucene Document Ids.
To achieve this three variables have been added to
the score script context:
- _doc for the internal Lucene doc id
- _shard for the shard id
- _indexName for the index name

Closes elastic#31461
  • Loading branch information
mayya-sharipova committed Mar 19, 2019
1 parent 31dd9c3 commit 4c7b596
Show file tree
Hide file tree
Showing 8 changed files with 177 additions and 123 deletions.
67 changes: 25 additions & 42 deletions docs/reference/query-dsl/script-score-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -182,62 +182,45 @@ different from the query's vector, 0 is used for missing dimensions
in the calculations of vector functions.


[[random-functions]]
===== Random functions
There are two predefined ways to produce random values:
`randomNotReproducible` and `randomReproducible`.

`randomNotReproducible()` uses `java.util.Random` class
to generate a random value of the type `long`.
The generated values are not reproducible between requests' invocations.

[source,js]
--------------------------------------------------
"script" : {
"source" : "randomNotReproducible()"
}
--------------------------------------------------
// NOTCONSOLE


`randomReproducible(String seedValue, int seed)` produces
reproducible random values of type `long`. This function requires
more computational time and memory than the non-reproducible version.

A good candidate for the `seedValue` is document field values that
are unique across documents and already pre-calculated and preloaded
in the memory. For example, values of the document's `_seq_no` field
is a good candidate, as documents on the same shard have unique values
for the `_seq_no` field.
[[random-score-function]]
===== Random score function
`random_score` function generates scores that are uniformly distributed
from 0 up to but not including 1.

`randomScore` function has the following syntax:
`randomScore(String docValue, String indexName, int shardId, int seed)`.
It requires a document value, an index name, shard id, and a seed.
For the document value, you can use `_doc` which represents
the internal Lucene doc ids; for the index name, you can use `_index`
which represents the index name of a corresponding document;
and for the shard id, you can use `_shard` which represents
the shard id of a corresponding document.

[source,js]
--------------------------------------------------
"script" : {
"source" : "randomReproducible(Long.toString(doc['_seq_no'].value), 100)"
"source" : "randomScore(String.valueOf(_doc), _index, _shard, 100)"
}
--------------------------------------------------
// NOTCONSOLE


A drawback of using `_seq_no` is that generated values change if
documents are updated. Another drawback is not absolute uniqueness, as
documents from different shards with the same sequence numbers
generate the same random values.

If you need random values to be distinct across different shards,
you can use a field with unique values across shards,
such as `_id`, but watch out for the memory usage as all
these unique values need to be loaded into memory.
Using the internal Lucene doc ids as a source of randomness is very efficient,
but unfortunately not reproducible since documents might be renumbered
by merges. Note that documents that are within the same shard and have the
same value for field will get the same score, so it is usually desirable
to use a field that has unique values for all documents across a shard.
A good default choice might be to use the `_seq_no`
field, whose only drawback is that scores will change if the document is
updated since update operations also update the value of the _seq_no field.

[source,js]
--------------------------------------------------
"script" : {
"source" : "randomReproducible(doc['_id'].value, 100)"
"source" : "randomScore(String.valueOf(doc['_seq_no'].value), _index, _shard, 100)"
}
--------------------------------------------------
// NOTCONSOLE


[[decay-functions]]
===== Decay functions for numeric fields
You can read more about decay functions
Expand Down Expand Up @@ -349,8 +332,8 @@ the following script:

===== `random_score`

Use `randomReproducible` and `randomNotReproducible` functions
as described in <<random-functions, random functions>>.
Use `randomScore` function
as described in <<random-score-function, random score function>>.


===== `field_value_factor`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@
static_import {
double saturation(double, double) from_class org.elasticsearch.script.ScoreScriptUtils
double sigmoid(double, double, double) from_class org.elasticsearch.script.ScoreScriptUtils
double randomReproducible(String, int) from_class org.elasticsearch.script.ScoreScriptUtils
double randomNotReproducible() bound_to org.elasticsearch.script.ScoreScriptUtils$RandomNotReproducible
double randomScore(String, String, int, int) from_class org.elasticsearch.script.ScoreScriptUtils
double decayGeoLinear(String, String, String, double, GeoPoint) bound_to org.elasticsearch.script.ScoreScriptUtils$DecayGeoLinear
double decayGeoExp(String, String, String, double, GeoPoint) bound_to org.elasticsearch.script.ScoreScriptUtils$DecayGeoExp
double decayGeoGauss(String, String, String, double, GeoPoint) bound_to org.elasticsearch.script.ScoreScriptUtils$DecayGeoGauss
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,61 +72,6 @@ setup:
- match: { hits.hits.1._id: d2 }
- match: { hits.hits.2._id: d1 }

---
"Random functions":
- do:
indices.create:
index: test
body:
settings:
number_of_shards: 2
mappings:
properties:
f1:
type: keyword
- do:
index:
index: test
id: 1
body: {"f1": "v1"}
- do:
index:
index: test
id: 2
body: {"f1": "v2"}
- do:
index:
index: test
id: 3
body: {"f1": "v3"}

- do:
indices.refresh: {}

- do:
search:
rest_total_hits_as_int: true
index: test
body:
query:
script_score:
query: {match_all: {} }
script:
source: "randomReproducible(Long.toString(doc['_seq_no'].value), 100)"
- match: { hits.total: 3 }

- do:
search:
rest_total_hits_as_int: true
index: test
body:
query:
script_score:
query: {match_all: {} }
script:
source: "randomNotReproducible()"
- match: { hits.total: 3 }

---
"Decay geo functions":
- do:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Integration tests for ScriptScoreQuery using Painless

setup:
- skip:
version: " - 7.99.99" # correct to 7.09.99 after backporting to 7.1
reason: "random score function of script score was added in 7.1"

---
"Random score function":
- do:
indices.create:
index: test
body:
settings:
number_of_shards: 2
mappings:
properties:
f1:
type: keyword

- do:
bulk:
refresh: true
body:
- '{"index": {"_index": "test"}}'
- '{"f1": "v0"}'
- '{"index": {"_index": "test"}}'
- '{"f1": "v1"}'
- '{"index": {"_index": "test"}}'
- '{"f1": "v2"}'
- '{"index": {"_index": "test"}}'
- '{"f1": "v3"}'
- '{"index": {"_index": "test"}}'
- '{"f1": "v4"}'
- '{"index": {"_index": "test"}}'
- '{"f1": "v5"}'
- '{"index": {"_index": "test"}}'
- '{"f1": "v6"}'

- do:
search:
rest_total_hits_as_int: true
index: test
body:
query:
script_score:
query: {match_all: {} }
script:
source: "randomScore(String.valueOf(doc['_seq_no'].value), _index, _shard, 100)"
# stash ids to check for reproducibility of ranking
- set: { hits.hits.0._id: id0 }
- set: { hits.hits.1._id: id1 }
- set: { hits.hits.2._id: id2 }
- set: { hits.hits.3._id: id3 }
- set: { hits.hits.4._id: id4 }
- set: { hits.hits.5._id: id5 }
- set: { hits.hits.6._id: id6 }

# check that ranking is reproducible
- do:
search:
rest_total_hits_as_int: true
index: test
body:
query:
script_score:
query: {match_all: {} }
script:
source: "randomScore(String.valueOf(doc['_seq_no'].value), _index, _shard, 100)"
- match: { hits.hits.0._id: $id0 }
- match: { hits.hits.1._id: $id1 }
- match: { hits.hits.2._id: $id2 }
- match: { hits.hits.3._id: $id3 }
- match: { hits.hits.4._id: $id4 }
- match: { hits.hits.5._id: $id5 }
- match: { hits.hits.6._id: $id6 }
Original file line number Diff line number Diff line change
Expand Up @@ -50,18 +50,33 @@ public float score() {

private final ScoreScript.LeafFactory script;

private final int shardId;
private final String indexName;


public ScriptScoreFunction(Script sScript, ScoreScript.LeafFactory script) {
super(CombineFunction.REPLACE);
this.sScript = sScript;
this.script = script;
this.indexName = null;
this.shardId = -1;
}

public ScriptScoreFunction(Script sScript, ScoreScript.LeafFactory script, String indexName, int shardId) {
super(CombineFunction.REPLACE);
this.sScript = sScript;
this.script = script;
this.indexName = indexName;
this.shardId = shardId;
}

@Override
public LeafScoreFunction getLeafScoreFunction(LeafReaderContext ctx) throws IOException {
final ScoreScript leafScript = script.newInstance(ctx);
final CannedScorer scorer = new CannedScorer();
leafScript.setScorer(scorer);
leafScript.setIndexName(indexName);
leafScript.setShard(shardId);
return new LeafScoreFunction() {
@Override
public double score(int docId, float subQueryScore) throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ protected ScoreFunction doToFunction(QueryShardContext context) {
try {
ScoreScript.Factory factory = context.getScriptService().compile(script, ScoreScript.CONTEXT);
ScoreScript.LeafFactory searchScript = factory.newFactory(script.getParams(), context.lookup());
return new ScriptScoreFunction(script, searchScript);
return new ScriptScoreFunction(script, searchScript, context.index().getName(), context.getShardId());
} catch (Exception e) {
throw new QueryShardException(context, "script_score: the script could not be loaded", e);
}
Expand Down
53 changes: 53 additions & 0 deletions server/src/main/java/org/elasticsearch/script/ScoreScript.java
Original file line number Diff line number Diff line change
Expand Up @@ -62,18 +62,25 @@ public abstract class ScoreScript {

private DoubleSupplier scoreSupplier = () -> 0.0;

private final int docBase;
private int docId;
private int shardId = -1;
private String indexName = null;

public ScoreScript(Map<String, Object> params, SearchLookup lookup, LeafReaderContext leafContext) {
// null check needed b/c of expression engine subclass
if (lookup == null) {
assert params == null;
assert leafContext == null;
this.params = null;
this.leafLookup = null;
this.docBase = 0;
} else {
this.leafLookup = lookup.getLeafSearchLookup(leafContext);
params = new HashMap<>(params);
params.putAll(leafLookup.asMap());
this.params = new DeprecationMap(params, DEPRECATIONS, "score-script");
this.docBase = leafContext.docBase;
}
}

Expand All @@ -91,6 +98,7 @@ public final Map<String, ScriptDocValues<?>> getDoc() {

/** Set the current document to run the script on next. */
public void setDocument(int docid) {
this.docId = docid;
leafLookup.setDocument(docid);
}

Expand All @@ -104,10 +112,55 @@ public void setScorer(Scorable scorer) {
};
}

/**
* Accessed as _score in the painless script
* @return the score of the inner query
*/
public double get_score() {
return scoreSupplier.getAsDouble();
}

/**
* Accessed as _doc in the painless script
* @return the internal document ID
*/
public int get_doc() {
return docBase + docId;
}

/**
* Accessed as _shard in the painless script
* @return shard id or throws an exception if shard is not set up for this script instance
*/
public int get_shard() {
if (shardId > -1) {
return shardId;
} else {
throw new IllegalArgumentException("shard id can not be looked up!");
}
}

/**
* Accessed as _index in the painless script
* @return index name or throws an exception if the index name is not set up for this script instance
*/
public String get_index() {
if (indexName != null) {
return indexName;
} else {
throw new IllegalArgumentException("index name can not be looked up!");
}
}

public void setShard(int shardId) {
this.shardId = shardId;
}

public void setIndexName(String indexName) {
this.indexName = indexName;
}


/** A factory to construct {@link ScoreScript} instances. */
public interface LeafFactory {

Expand Down
Loading

0 comments on commit 4c7b596

Please sign in to comment.