Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point in time fails for deleted indices and fails differently in comparison to scroll #81256

Closed
hendrikmuhs opened this issue Dec 2, 2021 · 4 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@hendrikmuhs
Copy link

Using a point in time reader I can create a consistent state to query for data with multiple searches. The data in the index might change, documents might be updated, added or deleted but pit will return the state of the pit creation.

Investigating #81252 which is about a problem of a user of pit: transform, I found some inconsistencies in the behavior of pit. This is a collection of the issues found. I decided to keep them in 1 issue, however this is a story and the problems are complex and not easy to solve.

pit and deleted indices

However if pit is used together with a set of indices specified by a wildcard, e.g. metricbeat-* the results with respect to indices might be a bit surprising. Added indices are not part of the created pit, however deleted indices result in an index not found exception.

This is no surprise from the technical side, however for a user we document:

Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated.

Sub-Issue: It would be good to document the limitation regarding deleted indices.

A user might expect that we softly delete the index, meaning it stays available for the point in time reader until the destruction of the context.

pit returns a 404

Elasticsearch default behavior is to return results even if the result set is incomplete. This is the nature of a search engine, where we care more about recall than about correctness. With allow_partial_search_results you can choose to fail instead or you check for shard failures in the response.

Sub-Issue: In case of a deleted index a pit search fails completely if an index got deleted. This is in contrast to the default elasticsearch behavior.

Note:

  • with pit I can still specify allow_partial_search_results, if true a shard failure does not cause a fail, a deleted index does
  • a scroll returns results, the missing index causes a shard failure:
        "shard" : -1,
       "index" : null,
       "reason" : {
         "type" : "search_context_missing_exception",
         "reason" : "No search context found for id [110100]"
       }
     }
  • the scroll error message does not contain the name of the deleted index
  • it's not possible to combine allow_partial_search_results with scroll, it seems to be ignored, however I haven't tried the cluster setting

Therefore scroll and pit behave differently for the same scenario.

pit should allow a filter query

The scenario behind a wildcard search or data stream is to easily query time series data. In #81252 we do not even care about the index that gets deleted, because the data is out of scope. The pit searches use a range query. Therefore the pit context invalidates due to data in an index that we are not going to retrieve.

Sub-Issue: If I could specify a filter query at pit creation, I could reduce the pit context and the search would not stumble upon the deleted index.

@hendrikmuhs hendrikmuhs added >bug :Search/Search Search-related issues that do not fall into other categories labels Dec 2, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@dnhatn
Copy link
Member

dnhatn commented Dec 3, 2021

@hendrikmuhs Thanks for writing a detailed report. We will prioritize addressing these points.

@dnhatn dnhatn self-assigned this Dec 3, 2021
@hendrikmuhs
Copy link
Author

Thanks @dnhatn!

One more thing I realized and that might help me handling the high-level issue:

Once the pit invalidates due to a deleted index, it throws a index not found exception. In wonder if that's the right exception type. Index not found should be thrown at pit creation, but when pit invalidates during usage this exception type might not be the right one. As explained above, scroll uses a search context missing exception. It seems to me a better fit.

This distinction is important for me, because I need to differentiate between a broken user request and a pit invalidation.

(Nit picking: At rest level index not found translates to 404, search context missing to 404, too. A 410 (GONE) seems to be a better fit for search context missing?)

hendrikmuhs pushed a commit that referenced this issue Dec 8, 2021
Do not fail the transform if pit search fails with index not found as a result of an index that got deleted via ILM, 
if that index is part of a search that selects indices using a wildcard, e.g. logs-*. If pit search fails, the search 
is retried using search without a pit context. The 2nd search might fail if the source targets an explicit index. 
In addition the usage of the pit API can not be disabled by transform.

fixes #81252
relates #81256
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Dec 8, 2021
Do not fail the transform if pit search fails with index not found as a result of an index that got deleted via ILM, 
if that index is part of a search that selects indices using a wildcard, e.g. logs-*. If pit search fails, the search 
is retried using search without a pit context. The 2nd search might fail if the source targets an explicit index. 
In addition the usage of the pit API can not be disabled by transform.

fixes elastic#81252
relates elastic#81256
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Dec 8, 2021
Do not fail the transform if pit search fails with index not found as a result of an index that got deleted via ILM, 
if that index is part of a search that selects indices using a wildcard, e.g. logs-*. If pit search fails, the search 
is retried using search without a pit context. The 2nd search might fail if the source targets an explicit index. 
In addition the usage of the pit API can not be disabled by transform.

fixes elastic#81252
relates elastic#81256
hendrikmuhs pushed a commit that referenced this issue Dec 8, 2021
Do not fail the transform if pit search fails with index not found as a result of an index that got deleted via ILM, 
if that index is part of a search that selects indices using a wildcard, e.g. logs-*. If pit search fails, the search 
is retried using search without a pit context. The 2nd search might fail if the source targets an explicit index. 
In addition the usage of the pit API can not be disabled by transform.

fixes #81252
relates #81256
elasticsearchmachine pushed a commit that referenced this issue Dec 8, 2021
* [Transform] handle pit index not found error (#81368)

Do not fail the transform if pit search fails with index not found as a result of an index that got deleted via ILM, 
if that index is part of a search that selects indices using a wildcard, e.g. logs-*. If pit search fails, the search 
is retried using search without a pit context. The 2nd search might fail if the source targets an explicit index. 
In addition the usage of the pit API can not be disabled by transform.

fixes #81252
relates #81256

* Update SettingsConfig.java

adapt version

* Update build.gradle

re-enable BWC tests

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
dnhatn added a commit that referenced this issue Dec 8, 2021
Today, a search request with PIT would fail immediately if any 
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.

Relates #81256
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Dec 8, 2021
Today, a search request with PIT would fail immediately if any 
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.

Relates elastic#81256
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Dec 8, 2021
Today, a search request with PIT would fail immediately if any 
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.

Relates elastic#81256
elasticsearchmachine pushed a commit that referenced this issue Dec 8, 2021
Today, a search request with PIT would fail immediately if any 
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.

Relates #81256
elasticsearchmachine pushed a commit that referenced this issue Dec 22, 2021
* Handle partial search result with point in time (#81349)

Today, a search request with PIT would fail immediately if any 
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.

Relates #81256

* fix tests

* Remove duplicated point-in-time doc

* fix test
@dnhatn dnhatn removed their assignment May 1, 2024
@benwtrent
Copy link
Member

PIT allows for partial results now. I don't know what else is required here, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants