-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add heuristics to compute pre_filter_shard_size when unspecified #53873
Add heuristics to compute pre_filter_shard_size when unspecified #53873
Conversation
This commit changes the pre_filter_shard_size default from 128 to unspecified. This allows to apply heuristics based on the request and the target indices when deciding whether the can match phase should run or not. When unspecified, this pr runs the can match phase automatically if one of these conditions is met: * The request targets more than 128 shards. * The request contains read-only indices. * The primary sort of the query targets an indexed field. Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value. Closes elastic#39835
Pinging @elastic/es-search (:Search/Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some questions and nits but nothing major. This change makes so much sense that I now wonder "why didn't we do this before?" :)
rest-api-spec/src/main/resources/rest-api-spec/api/msearch.json
Outdated
Show resolved
Hide resolved
When unspecified, the pre-filter phase is executed if any of these conditions is met: | ||
- The request targets more than `128` shards. | ||
- The request contains read-only indices. | ||
- The primary sort of the query targets an indexed field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking out loud: this could hit people who use CCS without minimizing roundtrips and experience latency, but given that we minimize roundtrips by default, it should be ok. and they can always increase the parameter manually should they see their CCS search slow down.
a threshold that, when exceeded, will enforce a round-trip to pre-filter search shards that cannot possibly match. | ||
This filter phase can limit the number of shards significantly. For instance, if a date range filter is applied, then all indices (frozen or unfrozen) that do not contain documents within the date range can be skipped efficiently. | ||
The default value for `pre_filter_shard_size` is `128` but it's recommended to set it to `1` when searching frozen indices. There is no | ||
significant overhead associated with this pre-filter phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to still explain and make sure that users don't mess with pre_filter_shard_size at this point? I would not want them to end up setting it when searching frozen indices. It sounds like there is never a good reason to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @jimczi
…stic#53873) This commit changes the pre_filter_shard_size default from 128 to unspecified. This allows to apply heuristics based on the request and the target indices when deciding whether the can match phase should run or not. When unspecified, this pr runs the can match phase automatically if one of these conditions is met: * The request targets more than 128 shards. * The request contains read-only indices. * The primary sort of the query targets an indexed field. Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value. Closes elastic#39835
) (#54007) This commit changes the pre_filter_shard_size default from 128 to unspecified. This allows to apply heuristics based on the request and the target indices when deciding whether the can match phase should run or not. When unspecified, this pr runs the can match phase automatically if one of these conditions is met: * The request targets more than 128 shards. * The request contains read-only indices. * The primary sort of the query targets an indexed field. Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value. Closes #39835
|
This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase automatically if one of these conditions is met:
Users can opt-out from this behavior by setting the
pre_filter_shard_size
to a static value.Closes #39835