forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add the ability to set the number of hits to track accurately
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates elastic#33028
- Loading branch information
Showing
33 changed files
with
498 additions
and
117 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
158 changes: 158 additions & 0 deletions
158
docs/reference/search/request/track-total-hits.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
[[search-request-track-total-hits]] | ||
=== Track total hits | ||
|
||
The total hit count can't be computed accurately without visiting all matches, | ||
which is costly for queries that match lots of documents. The `track_total_hits` | ||
parameter allows you to control how the total number of hits should be tracked. | ||
When set to `true` the search response will track the number of hits that match | ||
the query accurately: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"track_total_hits": true, | ||
"query" : { | ||
"match_all" : {} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
\... returns: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { | ||
"total" : { | ||
"value": 2048, <1> | ||
"relation": "eq" <2> | ||
}, | ||
"max_score" : 1.0, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"value": 2048/"total": $body.hits.total.value/] | ||
|
||
<1> The total number of hits that match the query. | ||
<2> The count is accurate (e.g. `"eq"` means equals). | ||
|
||
If you don't need to track the total number of hits you can improve query times | ||
by setting this option to `false`. In such case the search can efficiently skip | ||
non-competitive hits because it doesn't need to count all matches: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"track_total_hits": false, | ||
"query": { | ||
"term": { | ||
"title": "fast" | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
\... returns: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { <1> | ||
"max_score" : 0.42, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/] | ||
|
||
<1> The total number of hits is unknown. | ||
|
||
Given that it is often enough to have a lower bound of the number of hits, | ||
such as "there are more than 1000 hits", it is also possible to set | ||
`track_total_hits` as an integer that represents the number of hits to count | ||
accurately. The search can efficiently skip non-competitive document as soon | ||
as collecting at least $`track_total_hits` documents. This is a good trade | ||
off to speed up searches if you don't need the accurate number of hits after | ||
a certain threshold. | ||
|
||
|
||
For instance the following query will track the total hit count that match | ||
the query accurately up to 100 documents: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search | ||
{ | ||
"track_total_hits": 100, | ||
"query": { | ||
"term": { | ||
"title": "fast" | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
The `hits.total.relation` in the response will indicate if the | ||
value returned in `hits.total.value` is accurate (`eq`) or a lower | ||
bound of the total (`gte`). | ||
|
||
For instance the following response: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { | ||
"total" : { | ||
"value": 42, <1> | ||
"relation": "eq" <2> | ||
}, | ||
"max_score" : 0.42, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/] | ||
// TESTRESPONSE[s/"value": 100/"value": $body.hits.total.value/] | ||
|
||
<1> 42 documents match the query | ||
<2> and the count is accurate | ||
|
||
\... indicates that the number of hits returned in the `total` | ||
is accurate. | ||
|
||
If the total number of his that match the query is greater than the | ||
value set in `track_total_hits`, the total hits in the response | ||
will indicate that the returned value is a lower bound: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"_shards": ... | ||
"hits" : { | ||
"total" : { | ||
"value": 100, <1> | ||
"relation": "gte" | ||
}, | ||
"max_score" : 0.42, | ||
"hits" : [] | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
// TESTRESPONSE[s/"max_score": 0\.42/"max_score": $body.hits.max_score/] | ||
// TESTRESPONSE[s/"value": 100/"value": $body.hits.total.value/] | ||
|
||
<1> There are at least 100 documents that match the query | ||
<2> This is a lower bound (`gte`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.