-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex: Remove ability to sort #47567
Comments
Pinging @elastic/es-distributed (:Distributed/Reindex) |
We discussed this in our FixitFriday meeting and concluded that we think removing the ability to sort in reindex is the right path forward. Following is a summary of the use cases discussed that might be affected and their possible workarounds:
The plan is to move forward with deprecating the ability to sort during reindex. We will give this another 2-3 weeks here to gather input before taking concrete actions. Our current plan after that is to:
|
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567
Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to elastic#47567
Relates: #4341 Deprecate sorting in reindex elastic/elasticsearch#49458 (issue: elastic/elasticsearch#47567) Closes #4356
Relates: #4341 Deprecate sorting in reindex elastic/elasticsearch#49458 (issue: elastic/elasticsearch#47567) Closes #4356
Relates: #4341 Deprecate sorting in reindex elastic/elasticsearch#49458 (issue: elastic/elasticsearch#47567) Closes #4356 (cherry picked from commit 20a2133)
Maybe I miss something, but isn't a sorted reindex necessary for timeseries data? If I can no longer apply sorting here, it sounds like the timeseries could end up being spread randomly over all the resulting indices. That doesn't sound optimal regarding query plans. Since time based queries usually query consecutive time frames, it should be better if data from the same time period is close together, shouldn't it? And this would not be possible without applying sorting, from what I understand. |
As part of the reindex job specification sorting can be specified. Documentation describes that this can be used in combination with
max_docs
to extract either a specific or a random subset of data.However, specifying sorting is not compatible with the new upcoming resilient reindex mechanism, since this relies on sorting by seq_no. Any reindex request that sorts by anything but seq_no first will not be resilient.
When copying the full data set, sorting does not really make a difference, the net end result will be the same. Extracting subsets of data can likely be done by adding queries instead. To avoid having cases where reindex is not resilient, I propose to deprecate sorting in reindex in 7.x and remove sorting from reindex in 8.0.
This issue is created to gather feedback on this proposal. If you rely on being able to sort while reindexing, please let us know here.
The text was updated successfully, but these errors were encountered: