Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to associate a search task ID #23250

Closed
eskibars opened this issue Feb 19, 2017 · 14 comments
Closed

Ability to associate a search task ID #23250

eskibars opened this issue Feb 19, 2017 · 14 comments
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement help wanted adoptme

Comments

@eskibars
Copy link
Contributor

Describe the feature:
When you fire off a search request to Elasticsearch, you're stuck waiting until the result comes back. Normally, that's very, very fast. But occasionally an egregious search/dataset can take a while to get through, so we added the ability to kill them through the task manager. That's great, but it's difficult to use from the UI that executes the search.

Consider:

  1. User enters a search that will be slow into an external UI
  2. UI executes search to Elasticsearch
  3. User wants to cancel before results come back

How does the UI match up the search that was executed with the list of tasks that are in the system? The UI could try some heuristics to match up the search descriptions in the task manager to the original request, but it'd have to be done based upon heuristics and those heuristics become very difficult given Elasticsearch will have rewritten the query and that there could be multiple searches that match the heuristic.

It'd be nice if you could associate some ID with a request at search time and have that ID show up in the task manager. That way, when the UI executes the request, it could specify an ID it could reference later if it needs to kill the request.

@eskibars eskibars added the :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. label Feb 19, 2017
@evanvolgas
Copy link

evanvolgas commented Mar 15, 2017

I assume you are talking about something like generating a unique request id along the lines of https://blog.ryandlane.com/2014/12/11/using-lua-in-nginx-for-unique-request-ids-and-millisecond-times-in-logs/. If so, I am hugely in favor of this idea, especially if the search id were carried through to the slow query logs. If it were, that would be extremely helpful vis a vis efforts around improving slow query logging (eg #9172 and #12187 (comment)). It could also potentially lend itself to @PhaedrusTheGreek's idea of breaking down API response time (#21073) or even logging it outside of just the Profile API.

@nik9000
Copy link
Member

nik9000 commented Mar 20, 2017

We've talked about this on and off for a while.

If we do this I think it'd be easier if this were a thing for tasks in general rather than just searches. It might work like task status. It is a general thing but each request has to "opt in" to it. There should be a "standard" way to opt into it.

I think it'd be hard if we wanted to force these IDs to be unique because we don't have a good place for that.

I'm thinking of a task metadata url parameter which could be search using the list tasks API. Or something like that. @imotov, what do you think?

@imotov
Copy link
Contributor

imotov commented Mar 20, 2017

@nik9000 maybe we can somehow expose a whitelisted subset of headers from ThreadContext at the moment of the task creation. This way it would be possible to add stuff on the rest layer in a general way to all requests. Otherwise, each request you would want to "opt in" will have to add a place to "stash" the information you want to expose via task manager api.

@nik9000
Copy link
Member

nik9000 commented Mar 20, 2017

Maybe! If we can get it at the rest layer that'd be cool.

@jrubensteinsp
Copy link

This is the single most important feature for our environment. We have users that will run multiple searches in a row, and some are quite large. The ability to tag their searches and cancel specific searches prior to the latest that is still running would be an incredible benefit.

@lusid
Copy link

lusid commented Mar 30, 2017

+1! Ability to abort specific prior searches on demand would be huge. I would gladly manage UUIDs on our end and pass them up to just be appended to the task at search time if that means I could do it through a single REST call. The inability to abort ES tasks has been a problem we have had since 0.90.

@daedalus28
Copy link

👍 This is critical for our application because we have long-running analytic reports that sometimes are canceled by users, but the es cluster keeps going until it's done - and takes down the cluster in the process because users might immediately queue up different reports now that they've "canceled" the previous one.

@cilerler
Copy link

cilerler commented Mar 30, 2017

Allowing to assign an ID (not necessary unique one) for a search and be able to cancel it when its needed is crucial for heavy usage scenarios. 💡 Not having it is causing queuing up and leads to search rejections and it literally ties our hand and becomes bottleneck in our operation. 😞 Please make this issue priority 🤗 Thanks in advance

@dshishkov
Copy link

It would be really useful to have some sort of control over canceling queries for my use cases too. Thanks for considering it.

@clintongormley clintongormley added help wanted adoptme and removed discuss labels Mar 31, 2017
@Akrion
Copy link

Akrion commented Mar 31, 2017

+1

@imotov
Copy link
Contributor

imotov commented Mar 31, 2017

@jrubensteinsp, @lusid, @daedalus28, @cilerler, @dshishkov, @Akrion it seems that you all work for the same company. We are trying to make sure that this feature covers a variety of use cases and it would be helpful for us to understand if you have multiple use cases for this feature at your company or all these comments are essentially about the same application. If you have multiple use cases, it would really help us if you could describe what they are and how they defer from each other?

@cilerler
Copy link

cilerler commented Apr 6, 2017

@imotov you are right, we all from the same company but we are accessing Elastic from different applications and we realized that we all suffering from the same issue. "Not having a capability to cancel a query".

  • Our customer facing application generates dynamic queries on the fly based on user interaction.
  • Our ETL process is automatized system which depends on other actions taken on other applications.

In a very simple way, common desired implementation would be

assigning a key on our end (no round trip) and be able to cancel related queries based on that key

Thank you for your time and attention!

@evanvolgas
Copy link

evanvolgas commented Apr 7, 2017

I would echo the need for being able to cancel long running queries. In MySQL land, a lot of times you'll have a daemon running pt-kill on the server https://www.percona.com/doc/percona-toolkit/2.1/pt-kill.html. One of the nicest features of pt-kill is it can kill queries that match a certain pattern while leaving others alone.

If you could associate a task id (especially one you have some control over assigning, or at least prefixing) with a search, it would be very straightforward to write a similar tool for ES -- one that looks for long running queries and, assuming the ID associated with them are identified as killable, kills them.

On hot-warm deployments of ES especially, this would be pretty helpful for us. Our warm nodes tend to have much more data on them than our hot ones and Kibana queries against those warm nodes occasionally knock the warm nodes offline (which in turn puts pressure on the masters in the form of recovery tasks, which in turns slows the whole cluster down). If we could stalk the running search tasks and kill anything that's taking too long and doesn't have a prefix on its id to mark it as non-killable, that'd really do a lot for our overall cluster stability. Of the last 10 times ES has required manual intervention, all of them had to do with inefficient queries running against our warms and knocking them offline. I suspect stalking and killing problem queries would have prevented all or most of those issues from happening.

imotov added a commit that referenced this issue Apr 13, 2017
TaskInfo is stored as a part of TaskResult and therefore can be read by nodes with an older version. If we add any additional information to TaskInfo (for #23250, for example), nodes with an older version should be able to ignore it, otherwise they will not be able to read TaskResults stored by newer nodes.
imotov added a commit that referenced this issue Apr 13, 2017
TaskInfo is stored as a part of TaskResult and therefore can be read by nodes with an older version. If we add any additional information to TaskInfo (for #23250, for example), nodes with an older version should be able to ignore it, otherwise they will not be able to read TaskResults stored by newer nodes.
@AndreKR
Copy link
Contributor

AndreKR commented Sep 20, 2017

@imotov I'm not from that company but I also have a use case for this. :)
We are rendering images from Elasticsearch data and show them in the browser. Rendering an image, depending on the query parameters, can involve quite a few aggregations and it can take half a minute or so to gather the data.
When the user resizes the window or navigates to a different page, the browser cancels the HTTP request by closing the connection. We already propagate this through the load balancers to the render services, which will then cancel their work. However, currently it seems Elasticsearch will continue to work on those canceled queries.
It seems we could use the Task API to cancel the running search/aggregation, but only if we can find it's corresponding task ID.
In our code we already know which request we just canceled. If we could add an ID (could be a UUID, or in our case we might simply use a sequence number because only one application is using the cluster) and search for it using the Task API, then we could cancel the search by making a Task API request and save valuable CPU seconds/IOPs.

imotov added a commit to imotov/elasticsearch that referenced this issue Dec 21, 2017
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search).

Closes elastic#23250
imotov added a commit that referenced this issue Jan 12, 2018
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search).

Closes #23250

Usage:
```
$ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents"
{
  "tasks" : {
    "7qrTVbiDQKiZfubUP7DPkg:6998" : {
      "node" : "7qrTVbiDQKiZfubUP7DPkg",
      "id" : 6998,
      "type" : "transport",
      "action" : "cluster:monitor/tasks/lists",
      "start_time_in_millis" : 1513029940042,
      "running_time_in_nanos" : 266794,
      "cancellable" : false,
      "headers" : {
        "X-Opaque-Id" : "imotov"
      },
      "children" : [
        {
          "node" : "V-PuCjPhRp2ryuEsNw6V1g",
          "id" : 6088,
          "type" : "netty",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 67785,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        },
        {
          "node" : "7qrTVbiDQKiZfubUP7DPkg",
          "id" : 6999,
          "type" : "direct",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 98754,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        }
      ]
    }
  }
}
```
imotov added a commit that referenced this issue Jan 15, 2018
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search).

Closes #23250

Usage:
```
$ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents"
{
  "tasks" : {
    "7qrTVbiDQKiZfubUP7DPkg:6998" : {
      "node" : "7qrTVbiDQKiZfubUP7DPkg",
      "id" : 6998,
      "type" : "transport",
      "action" : "cluster:monitor/tasks/lists",
      "start_time_in_millis" : 1513029940042,
      "running_time_in_nanos" : 266794,
      "cancellable" : false,
      "headers" : {
        "X-Opaque-Id" : "imotov"
      },
      "children" : [
        {
          "node" : "V-PuCjPhRp2ryuEsNw6V1g",
          "id" : 6088,
          "type" : "netty",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 67785,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        },
        {
          "node" : "7qrTVbiDQKiZfubUP7DPkg",
          "id" : 6999,
          "type" : "direct",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1513029940043,
          "running_time_in_nanos" : 98754,
          "cancellable" : false,
          "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998",
          "headers" : {
            "X-Opaque-Id" : "imotov"
          }
        }
      ]
    }
  }
}
```
martijnvg pushed a commit to martijnvg/elasticsearch that referenced this issue Jan 31, 2018
martijnvg pushed a commit that referenced this issue Feb 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement help wanted adoptme
Projects
None yet
Development

No branches or pull requests