Adds the ability to acquire readers in IndexShard #54966

jimczi · 2020-04-08T16:34:37Z

This change adds the ability to acquire a point in time reader on an engine.
This is needed for frozen indices that lazily loads the reader on every
phase of the search requests.
Acquiring a reader on a frozen index ensures that the engine will not be closed until the reader is released, leaving the directory reader unopened until a call to acquire a searcher is made.
When the searcher is closed, the underlyinng directory reader is also closed unless another requests on the same frozen shard is in-flight.
This ensures that the directory reader of frozen indices is opened only when requests are
executed (they consume a thread in the search throttled pool).

This change adds the ability to acquire a point in time reader on an engine. This is needed for frozen indices that lazily loads the reader on every phase of the search requests. Acquiring a reader on a frozen index ensures that the engine will not be closed until the reader is released, leaving the directory reader unopened until a call to acquire a searcher is made. When the searcher is closed, the underlyinng directory reader is also closed unless another requests on the same frozen shard is in-flight. This ensures that the directory reader of frozen indices is opened only when requests are executed (they consume a thread in the search throttled pool).

elasticmachine · 2020-04-08T19:50:44Z

Pinging @elastic/es-distributed (:Distributed/Engine)

dnhatn

Thanks, Jim! I like the fact that we have simplified SearchOperationListener significantly in this change. However, I wonder if we can avoid introducing a new wrapper (i.e., Engine#Reader). Can we introduce unload/reload methods to Engine. Searcher, then call them in ReaderContext and LegacyReaderContext instead? I might be missing something here.

x-pack/plugin/security/src/test/java/org/elasticsearch/integration/FieldLevelSecurityTests.java

.../plugin/security/src/test/java/org/elasticsearch/integration/DocumentLevelSecurityTests.java

…ext_engine

jimczi · 2020-04-24T13:22:57Z

I pushed a change to limit reader contexts to the user that created it. @dnhatn can you take another look ?

dnhatn

I've left some comments, but this looks great. Thanks Jim!

server/src/main/java/org/elasticsearch/search/SearchService.java

server/src/main/java/org/elasticsearch/action/OriginalIndices.java

test/framework/src/main/java/org/elasticsearch/test/engine/MockInternalEngine.java

x-pack/plugin/frozen-indices/src/main/java/org/elasticsearch/xpack/frozen/FrozenIndices.java

...ty/src/main/java/org/elasticsearch/xpack/security/authz/SecuritySearchOperationListener.java

dnhatn

LGTM

...ty/src/main/java/org/elasticsearch/xpack/security/authz/SecuritySearchOperationListener.java

x-pack/plugin/frozen-indices/src/test/java/org/elasticsearch/index/engine/FrozenIndexTests.java

jimczi · 2020-04-27T22:11:58Z

Thanks @dnhatn

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: elastic#52741 - Allow searches with a specific reader context: elastic#53989 - Add the ability to acquire readers in IndexShard: elastic#54966 Relates elastic#46523 Relates elastic#26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>

jimczi requested a review from dnhatn April 8, 2020 16:34

jimczi added 4 commits April 8, 2020 19:34

checkstyle

90cd41b

avoid illegal access error

e2bb5b8

cleanup

d4afa8c

unused import

403f2da

gwbrown added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Apr 8, 2020

jimczi added 3 commits April 8, 2020 23:34

handle scroll correctly

1a69b48

add more tests

1286a95

Merge branch 'feature/reader-context' into reader_context_engine

aae7ce2

dnhatn reviewed Apr 14, 2020

View reviewed changes

x-pack/plugin/security/src/test/java/org/elasticsearch/integration/FieldLevelSecurityTests.java Outdated Show resolved Hide resolved

.../plugin/security/src/test/java/org/elasticsearch/integration/DocumentLevelSecurityTests.java Outdated Show resolved Hide resolved

jimczi added 2 commits April 22, 2020 13:06

Merge remote-tracking branch 'origin/reader-context' into reader_cont…

058d663

…ext_engine

Make reader context per-user

f81d651

jimczi force-pushed the reader_context_engine branch from 4bd17b1 to f81d651 Compare April 24, 2020 13:22

jimczi added 5 commits April 24, 2020 15:25

address review comment

2fb5143

fix javadoc

7e7d757

fix serialization of original indices in search shard target

e1335d6

fix serialization of original indices

5dceccb

add missing change

22cab68

dnhatn self-requested a review April 27, 2020 01:59

Merge branch 'feature/reader-context' into jimczi/searcher-supplier

a07c0e1

dnhatn reviewed Apr 27, 2020

View reviewed changes

jimczi added 4 commits April 27, 2020 13:41

address feedback

0ec84ab

revert unwanted change

4f46988

fix failing tests

23f2d2a

fix precommit

4750a48

dnhatn approved these changes Apr 27, 2020

View reviewed changes

...ty/src/main/java/org/elasticsearch/xpack/security/authz/SecuritySearchOperationListener.java Outdated Show resolved Hide resolved

x-pack/plugin/frozen-indices/src/test/java/org/elasticsearch/index/engine/FrozenIndexTests.java Outdated Show resolved Hide resolved

remove TODO and fix precommit

7036c10

jimczi added 2 commits April 27, 2020 22:48

remove reader context on exception

8a3ad75

replace with a try/finally

752b559

jimczi merged commit 51f9542 into elastic:reader-context Apr 27, 2020

jimczi deleted the reader_context_engine branch April 27, 2020 22:11

dnhatn mentioned this pull request May 9, 2020

Introduce search context - point in time view of indices #56480

Closed

dnhatn mentioned this pull request Aug 12, 2020

Introduce point in time APIs in x-pack basic #61062

Merged

dnhatn mentioned this pull request Sep 2, 2020

Introduce point in time APIs in x-pack basic #61872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds the ability to acquire readers in IndexShard #54966

Adds the ability to acquire readers in IndexShard #54966

jimczi commented Apr 8, 2020

elasticmachine commented Apr 8, 2020

dnhatn left a comment

jimczi commented Apr 24, 2020

dnhatn left a comment

dnhatn left a comment

jimczi commented Apr 27, 2020

Adds the ability to acquire readers in IndexShard #54966

Adds the ability to acquire readers in IndexShard #54966

Conversation

jimczi commented Apr 8, 2020

elasticmachine commented Apr 8, 2020

dnhatn left a comment

Choose a reason for hiding this comment

jimczi commented Apr 24, 2020

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn left a comment

Choose a reason for hiding this comment

jimczi commented Apr 27, 2020