Skip to content

Commit

Permalink
Track total hits up to 10,000 by default
Browse files Browse the repository at this point in the history
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that
users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.

Closes elastic#33028
  • Loading branch information
jimczi committed Jan 15, 2019
1 parent d6a104f commit 1208d5a
Show file tree
Hide file tree
Showing 19 changed files with 190 additions and 93 deletions.
6 changes: 5 additions & 1 deletion docs/reference/getting-started.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -793,7 +793,11 @@ As for the response, we see the following parts:
* `hits._score` and `max_score` - ignore these fields for now

The accuracy of `hits.total` is controlled by the request parameter `track_total_hits`, when set to true
the request will track the total hits accurately (`"relation": "eq"`).
the request will track the total hits accurately (`"relation": "eq"`). It defaults to `10,000`
which means that the total hit count is accurately tracked up to `10,000` documents.
You can force an accurate count by setting `track_total_hits` to true explicitly.
See the <<search-request-track-total-hits, request body>> documentation
for more details.

Here is the same exact search above using the alternative request body method:

Expand Down
3 changes: 2 additions & 1 deletion docs/reference/index-modules/index-sorting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ as soon as N documents have been collected per segment.

<1> The total number of hits matching the query is unknown because of early termination.

NOTE: Aggregations will collect all documents that match the query regardless of the value of `track_total_hits`
NOTE: Aggregations will collect all documents that match the query regardless
of the value of `track_total_hits`

[[index-modules-index-sorting-conjunctions]]
=== Use index sorting to speed up conjunctions
Expand Down
29 changes: 29 additions & 0 deletions docs/reference/migration/migrate_7_0/search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,32 @@ If `track_total_hits` is set to `false` in the search request the search respons
will set `hits.total` to null and the object will not be displayed in the rest
layer. You can add `rest_total_hits_as_int=true` in the search request parameters
to get the old format back (`"total": -1`).

[float]
==== `track_total_hits` defaults to 10,000

By default search request will count the total hits accurately up to `10,000`
documents. If the total number of hits that match the query is greater than this
value, the response will indicate that the returned value is a lower bound:

[source,js]
--------------------------------------------------
{
"_shards": ...
"timed_out": false,
"took": 100,
"hits": {
"max_score": 1.0,
"total" : {
"value": 10000, <1>
"relation": "gte" <2>
},
"hits": ...
}
}
<1> There are at least 10000 documents that match the query
<2> This is a lower bound (`"gte"`).
You can force the count to always be accurate by setting `"track_total_hits`
to true explicitly in the search request.
2 changes: 1 addition & 1 deletion docs/reference/query-dsl/feature-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ of the query.
Compared to using <<query-dsl-function-score-query,`function_score`>> or other
ways to modify the score, this query has the benefit of being able to
efficiently skip non-competitive hits when
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be
<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
spectacular.

Here is an example that indexes various features:
Expand Down
119 changes: 62 additions & 57 deletions docs/reference/search/request/track-total-hits.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,20 @@
Generally the total hit count can't be computed accurately without visiting all
matches, which is costly for queries that match lots of documents. The
`track_total_hits` parameter allows you to control how the total number of hits
should be tracked. When set to `true` the search response will always track the
number of hits that match the query accurately (e.g. `total.relation` will always
be equal to `"eq"` when `track_total_hits is set to true).
should be tracked.
Given that it is often enough to have a lower bound of the number of hits,
such as "there are at least 10000 hits", the default is set to `10,000`.
This means that requests will count the total hit accurately up to `10,000` hits.
It's is a good trade off to speed up searches if you don't need the accurate number
of hits after a certain threshold.

When set to `true` the search response will always track the number of hits that
match the query accurately (e.g. `total.relation` will always be equal to `"eq"`
when `track_total_hits is set to true). Otherwise the `"total.relation"` returned
in the `"total"` object in the search response determines how the `"total.value"`
should be interpreted. A value of `"gte"` means that the `"total.value"` is a
lower bound of the total hits that match the query and a value of `"eq"` indicates
that `"total.value"` is the accurate count.

[source,js]
--------------------------------------------------
Expand Down Expand Up @@ -50,57 +61,9 @@ GET twitter/_search
<1> The total number of hits that match the query.
<2> The count is accurate (e.g. `"eq"` means equals).

If you don't need to track the total number of hits you can improve query times
by setting this option to `false`. In such case the search can efficiently skip
non-competitive hits because it doesn't need to count all matches:

[source,js]
--------------------------------------------------
GET twitter/_search
{
"track_total_hits": false,
"query": {
"match" : {
"message" : "Elasticsearch"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

\... returns:

[source,js]
--------------------------------------------------
{
"_shards": ...
"timed_out": false,
"took": 10,
"hits" : { <1>
"max_score": 1.0,
"hits": ...
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]

<1> The total number of hits is unknown.

Given that it is often enough to have a lower bound of the number of hits,
such as "there are at least 1000 hits", it is also possible to set
`track_total_hits` as an integer that represents the number of hits to count
accurately. The search can efficiently skip non-competitive document as soon
as collecting at least $`track_total_hits` documents. This is a good trade
off to speed up searches if you don't need the accurate number of hits after
a certain threshold.


For instance the following query will track the total hit count that match
the query accurately up to 100 documents:
It is also possible to set `track_total_hits` to an integer.
For instance the following query will accurately track the total hit count that match
the query up to 100 documents:

[source,js]
--------------------------------------------------
Expand All @@ -118,8 +81,8 @@ GET twitter/_search
// TEST[continued]

The `hits.total.relation` in the response will indicate if the
value returned in `hits.total.value` is accurate (`eq`) or a lower
bound of the total (`gte`).
value returned in `hits.total.value` is accurate (`"eq"`) or a lower
bound of the total (`"gte"`).

For instance the following response:

Expand Down Expand Up @@ -173,4 +136,46 @@ will indicate that the returned value is a lower bound:
// TEST[skip:response is already tested in the previous snippet]

<1> There are at least 100 documents that match the query
<2> This is a lower bound (`gte`).
<2> This is a lower bound (`"gte"`).

If you don't need to track the total number of hits at all you can improve query
times by setting this option to `false`:

[source,js]
--------------------------------------------------
GET twitter/_search
{
"track_total_hits": false,
"query": {
"match" : {
"message" : "Elasticsearch"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

\... returns:

[source,js]
--------------------------------------------------
{
"_shards": ...
"timed_out": false,
"took": 10,
"hits" : { <1>
"max_score": 1.0,
"hits": ...
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]

<1> The total number of hits is unknown.

Finally you can force an accurate count by setting `"track_total_hits"`
to `true` in the request.
2 changes: 1 addition & 1 deletion docs/reference/search/uri-request.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ is important).
|`track_scores` |When sorting, set to `true` in order to still track
scores and return them as part of each hit.

|`track_total_hits` |Defaults to true. Set to `false` in order to disable the tracking
|`track_total_hits` |Defaults to `10,000`. Set to `false` in order to disable the tracking
of the total number of hits that match the query.
It also accepts an integer which in this case represents the number of
hits to count accurately.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,11 @@ public final void start() {
//no search shards to search on, bail with empty response
//(it happens with search across _all with no indices around and consistent with broadcast operations)

boolean withTotalHits = request.source() != null ?
// total hits is null in the response if the tracking of total hits is disabled
request.source().trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_DISABLED : true;
int trackTotalHitsUpTo = request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
request.source().trackTotalHitsUpTo() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
request.source().trackTotalHitsUpTo();
// total hits is null in the response if the tracking of total hits is disabled
boolean withTotalHits = trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_DISABLED;
listener.onResponse(new SearchResponse(InternalSearchResponse.empty(withTotalHits), null, 0, 0, 0, buildTookInMillis(),
ShardSearchFailure.EMPTY_ARRAY, clusters));
return;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -712,6 +712,16 @@ int getNumBuffered() {
int getNumReducePhases() { return numReducePhases; }
}

private int resolveTrackTotalHits(SearchRequest request) {
if (request.scroll() != null) {
// no matter what the value of track_total_hits is
return SearchContext.TRACK_TOTAL_HITS_ACCURATE;
}
Integer trackTotalHits = request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO :
request.source().trackTotalHitsUpTo();
return trackTotalHits == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : trackTotalHits;
}

/**
* Returns a new ArraySearchPhaseResults instance. This might return an instance that reduces search responses incrementally.
*/
Expand All @@ -720,7 +730,7 @@ InitialSearchPhase.ArraySearchPhaseResults<SearchPhaseResult> newSearchPhaseResu
boolean isScrollRequest = request.scroll() != null;
final boolean hasAggs = source != null && source.aggregations() != null;
final boolean hasTopDocs = source == null || source.size() != 0;
final int trackTotalHitsUpTo = source == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : source.trackTotalHitsUpTo();
final int trackTotalHitsUpTo = resolveTrackTotalHits(request);
final boolean finalReduce = request.getLocalClusterAlias() == null;

if (isScrollRequest == false && (hasAggs || hasTopDocs)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.elasticsearch.common.xcontent.ToXContent;
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.internal.SearchContext;
import org.elasticsearch.tasks.Task;
import org.elasticsearch.tasks.TaskId;

Expand Down Expand Up @@ -222,7 +223,10 @@ public void writeTo(StreamOutput out) throws IOException {
public ActionRequestValidationException validate() {
ActionRequestValidationException validationException = null;
final Scroll scroll = scroll();
if (source != null && source.trackTotalHits() == false && scroll != null) {
if (source != null
&& source.trackTotalHitsUpTo() != null
&& source.trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_ACCURATE
&& scroll != null) {
validationException =
addValidationError("disabling [track_total_hits] is not allowed in a scroll context", validationException);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,13 @@ public int readInt() throws IOException {
| ((readByte() & 0xFF) << 8) | (readByte() & 0xFF);
}

public Integer readOptionalInt() throws IOException {
if (readBoolean()) {
return readInt();
}
return null;
}

/**
* Reads an int stored in variable-length format. Reads between one and
* five bytes. Smaller values take fewer bytes. Negative numbers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,15 @@ public void writeOptionalString(@Nullable String str) throws IOException {
}
}

public void writeOptionalInt(@Nullable Integer integer) throws IOException {
if (integer == null) {
writeBoolean(false);
} else {
writeBoolean(true);
writeInt(integer);
}
}

public void writeOptionalVInt(@Nullable Integer integer) throws IOException {
if (integer == null) {
writeBoolean(false);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ public static void parseSearchRequest(SearchRequest searchRequest, RestRequest r
searchRequest.routing(request.param("routing"));
searchRequest.preference(request.param("preference"));
searchRequest.indicesOptions(IndicesOptions.fromRequest(request, searchRequest.indicesOptions()));

checkRestTotalHits(request, searchRequest);
}

Expand Down Expand Up @@ -237,6 +238,7 @@ private static void parseSearchSource(final SearchSourceBuilder searchSourceBuil
searchSourceBuilder.trackScores(request.paramAsBoolean("track_scores", false));
}


if (request.hasParam("track_total_hits")) {
if (Booleans.isBoolean(request.param("track_total_hits"))) {
searchSourceBuilder.trackTotalHits(
Expand Down Expand Up @@ -286,17 +288,26 @@ private static void parseSearchSource(final SearchSourceBuilder searchSourceBuil
}

/**
* Throws an {@link IllegalArgumentException} if {@link #TOTAL_HITS_AS_INT_PARAM}
* is used in conjunction with a lower bound value for the track_total_hits option.
* Modify the search request to accurately count the total hits that match the query
* if {@link #TOTAL_HITS_AS_INT_PARAM} is set.
*
* @throws IllegalArgumentException if {@link #TOTAL_HITS_AS_INT_PARAM}
* is used in conjunction with a lower bound value (other than {@link SearchContext#DEFAULT_TRACK_TOTAL_HITS_UP_TO})
* for the track_total_hits option.
*/
public static void checkRestTotalHits(RestRequest restRequest, SearchRequest searchRequest) {
int trackTotalHitsUpTo = searchRequest.source() == null ?
SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : searchRequest.source().trackTotalHitsUpTo();
if (trackTotalHitsUpTo == SearchContext.TRACK_TOTAL_HITS_ACCURATE ||
trackTotalHitsUpTo == SearchContext.TRACK_TOTAL_HITS_DISABLED) {
return ;
boolean totalHitsAsInt = restRequest.paramAsBoolean(TOTAL_HITS_AS_INT_PARAM, false);
if (totalHitsAsInt == false) {
return;
}
if (searchRequest.source() == null) {
searchRequest.source(new SearchSourceBuilder());
}
if (restRequest.paramAsBoolean(TOTAL_HITS_AS_INT_PARAM, false)) {
Integer trackTotalHitsUpTo = searchRequest.source().trackTotalHitsUpTo();
if (trackTotalHitsUpTo == null) {
searchRequest.source().trackTotalHits(true);
} else if (trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_ACCURATE
&& trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_DISABLED) {
throw new IllegalArgumentException("[" + TOTAL_HITS_AS_INT_PARAM + "] cannot be used " +
"if the tracking of total hits is not accurate, got " + trackTotalHitsUpTo);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -811,10 +811,14 @@ private void parseSource(DefaultSearchContext context, SearchSourceBuilder sourc
}
}
context.trackScores(source.trackScores());
if (source.trackTotalHits() == false && context.scrollContext() != null) {
if (source.trackTotalHitsUpTo() != null
&& source.trackTotalHitsUpTo() != SearchContext.TRACK_TOTAL_HITS_ACCURATE
&& context.scrollContext() != null) {
throw new SearchContextException(context, "disabling [track_total_hits] is not allowed in a scroll context");
}
context.trackTotalHitsUpTo(source.trackTotalHitsUpTo());
if (source.trackTotalHitsUpTo() != null) {
context.trackTotalHitsUpTo(source.trackTotalHitsUpTo());
}
if (source.minScore() != null) {
context.minimumScore(source.minScore());
}
Expand Down
Loading

0 comments on commit 1208d5a

Please sign in to comment.