Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search] Search cancelation is fragile, searches might be running for days #106395

Closed
Dosant opened this issue Jul 21, 2021 · 4 comments
Closed
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Sessions Feature:Search Querying infrastructure in Kibana impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort performance

Comments

@Dosant
Copy link
Contributor

Dosant commented Jul 21, 2021

TLDR: default search expiration is 7 days. If Kibana for some reason failed to delete an async search, then it could keep running for days causing a redundant excessive load on a cluster.


Version: since 7.12

This came up from a quick investigation of how we do the cancelations after a customer noticed that there are hanging unexpected async searches in their cluster. It appears there are currently multiple scenarios where Kibana could initiate a long-running search with a 7 days expiration limit and never clean it up itself.

As I understand, this is how it currently works when search sessions are enabled (default):

Start a search. Don't save a search session. Expiration for the search is set for 7 days. This means that elasticsearch will search for 7 days or until Kibana deletes the search.

Kibana deletes searches in the following scenarios:

  • Delete request sent from a browser when a user navigates away within Kibana, starts a new search, or client-side timeout is hit `search:timeout.
  • By search session monitoring task, that should clean up abandoned searches or non-saved search searches ~1 minute after user has left search page.

The problem with this setup:

So if the browser doesn't act on search:timeout (bug or user navigates away from Kibana) and there's any problem with Kibana session monitoring task (e.g. a bug, like #105726, or monitoring task might be turned off) or Kibana is simply not running, then the search might continue for days.

Possible solution:

Do not set such a long expiration 7d? Somehow approach it another way? Like maybe extend searches from Kibana's monitoring tasks only for persisted sessions?

cc @lizozom @lukasolson @elastic-jb I hope my understanding of the current setup is correct.

@Dosant Dosant added bug Fixes for quality problems that affect the customer experience Feature:Search Querying infrastructure in Kibana Team:AppServices labels Jul 21, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jul 22, 2021
@exalate-issue-sync exalate-issue-sync bot added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Jul 29, 2021
@Dosant
Copy link
Contributor Author

Dosant commented Aug 9, 2021

We need to also check the logic around bfetch. We might also have a bug there: When there are two requests inside a single bfetch request, and only a single request was aborted, will the client send DELETE .. request to cancel a corresponding async search?

@exalate-issue-sync exalate-issue-sync bot added loe:medium Medium Level of Effort impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. and removed loe:small Small Level of Effort impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. labels Aug 9, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:x-large Extra Large Level of Effort loe:large Large Level of Effort and removed loe:medium Medium Level of Effort loe:x-large Extra Large Level of Effort labels Nov 19, 2021
@Dosant
Copy link
Contributor Author

Dosant commented Jan 13, 2022

One more case that reveals how fragile this is: #122955

@ppisljar
Copy link
Member

Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment.

@ppisljar ppisljar closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Search Sessions Feature:Search Querying infrastructure in Kibana impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort performance
Projects
None yet
Development

No branches or pull requests

4 participants