Introduce point in time APIs in x-pack basic #61872

dnhatn · 2020-09-02T17:27:21Z

This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
keep_alive parameter tells Elasticsearch how long it should keep a
point in time around.

POST /my_index/_pit?keep_alive=1m

The response from the above request includes a id, which should be
passed to the id of the pit parameter of search requests.

POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}

Point-in-times are automatically closed when the keep_alive is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}

Notable works in this change:

Move the search state to the coordinating node: Move states of search to coordinating node #52741
Allow searches with a specific reader context: Allow searches with specific reader contexts #53989
Add the ability to acquire readers in IndexShard: Adds the ability to acquire readers in IndexShard #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi jimczi@apache.org

Backport of #61062

javanna · 2020-09-08T14:13:21Z

watching this PR so I don't forget to backport #62080 when it gets merged.

javanna · 2020-09-09T11:48:17Z

heads up we crossed streams a bit, so once this gets merged to 7.x, #62080 needs to be backported, together with 2288b72 and 8d3b239 . Also the point in time getDescription needs to be adapted according to #62057 which I already backported.

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: elastic#52741 - Allow searches with a specific reader context: elastic#53989 - Add the ability to acquire readers in IndexShard: elastic#54966 Relates elastic#46523 Relates elastic#26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object. This commit fixes this and adds tests for it.

) This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext) before any other operation and that it is correctly recycled or removed at the end of the operation. This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once. Relates elastic#61446 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>

Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced, this behavior can lead to missing data since scrolls always move forward. In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures in the responses. This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above. That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document silently. Finally this change removes an old TODO that was fulfilled with elastic#61062.

A search request should not be required to extend the keep_alive of a point in time. This change makes that parameter optional.

…2179) Previously, we close related search contexts if the keep_alive of a scroll is too large. But we accidentally change this behavior in elastic#62061.

This commit integrates point in time into async search and ensures that it works correctly with security enabled. Relates elastic#61062

…c#61658) If shards are relocated to new nodes, then searches with a point in time will fail, although a pit keeps search contexts open. This commit solves this problem by reducing info used by SearchShardIterator and always including the matching nodes when resolving a point in time. Closes elastic#61627

This commit integrates point in time into cross cluster search. Relates elastic#61062 Closes elastic#61790

Relates #61062 Relates #61872

dnhatn · 2020-09-10T23:27:57Z

I've backported all these commits locally to reserve the history.

Relates #61872

dnhatn changed the title ~~Introduce point in time APIs in x-pack basic (#61062)~~ Introduce point in time APIs in x-pack basic Sep 2, 2020

dnhatn added the backport label Sep 2, 2020

dnhatn marked this pull request as ready for review September 3, 2020 02:03

dnhatn mentioned this pull request Sep 9, 2020

[7.x] [DOCS] Add PIT to search after docs (#61593) #62101

Merged

dnhatn force-pushed the pit-7x branch from 7ba1498 to 0f62c6e Compare September 10, 2020 17:22

dnhatn and others added 9 commits September 10, 2020 17:07

Make keep alive of point in time optional in search (elastic#62184)

13b7a87

A search request should not be required to extend the keep_alive of a point in time. This change makes that parameter optional.

Release search context when scroll keep_alive is too large (elastic#6…

28a3480

…2179) Previously, we close related search contexts if the keep_alive of a scroll is too large. But we accidentally change this behavior in elastic#62061.

Support point in time in async_search (elastic#61560)

61d66fe

This commit integrates point in time into async search and ensures that it works correctly with security enabled. Relates elastic#61062

Support point in time cross cluster search (elastic#61827)

58b498e

This commit integrates point in time into cross cluster search. Relates elastic#61062 Closes elastic#61790

dnhatn force-pushed the pit-7x branch from bd8438c to 58b498e Compare September 10, 2020 21:08

dnhatn added a commit that referenced this pull request Sep 10, 2020

Disable BWC to backport point in time to 7.10

eaf4ce2

Relates #61062 Relates #61872

dnhatn closed this Sep 10, 2020

dnhatn deleted the pit-7x branch September 10, 2020 23:28

dnhatn mentioned this pull request Sep 11, 2020

Adjust BWC rest version for point in time #62264

Merged

dnhatn added a commit that referenced this pull request Sep 11, 2020

Adjust BWC rest version for point in time (#62264)

b118697

Relates #61872

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce point in time APIs in x-pack basic #61872

Introduce point in time APIs in x-pack basic #61872

dnhatn commented Sep 2, 2020 •

edited

Loading

javanna commented Sep 8, 2020

javanna commented Sep 9, 2020

dnhatn commented Sep 10, 2020

Introduce point in time APIs in x-pack basic #61872

Introduce point in time APIs in x-pack basic #61872

Conversation

dnhatn commented Sep 2, 2020 • edited Loading

Notable works in this change:

javanna commented Sep 8, 2020

javanna commented Sep 9, 2020

dnhatn commented Sep 10, 2020

dnhatn commented Sep 2, 2020 •

edited

Loading