Ability to associate a search task ID #23250

eskibars · 2017-02-19T16:11:27Z

Describe the feature:
When you fire off a search request to Elasticsearch, you're stuck waiting until the result comes back. Normally, that's very, very fast. But occasionally an egregious search/dataset can take a while to get through, so we added the ability to kill them through the task manager. That's great, but it's difficult to use from the UI that executes the search.

Consider:

User enters a search that will be slow into an external UI
UI executes search to Elasticsearch
User wants to cancel before results come back

How does the UI match up the search that was executed with the list of tasks that are in the system? The UI could try some heuristics to match up the search descriptions in the task manager to the original request, but it'd have to be done based upon heuristics and those heuristics become very difficult given Elasticsearch will have rewritten the query and that there could be multiple searches that match the heuristic.

It'd be nice if you could associate some ID with a request at search time and have that ID show up in the task manager. That way, when the UI executes the request, it could specify an ID it could reference later if it needs to kill the request.

evanvolgas · 2017-03-15T13:07:06Z

I assume you are talking about something like generating a unique request id along the lines of https://blog.ryandlane.com/2014/12/11/using-lua-in-nginx-for-unique-request-ids-and-millisecond-times-in-logs/. If so, I am hugely in favor of this idea, especially if the search id were carried through to the slow query logs. If it were, that would be extremely helpful vis a vis efforts around improving slow query logging (eg #9172 and #12187 (comment)). It could also potentially lend itself to @PhaedrusTheGreek's idea of breaking down API response time (#21073) or even logging it outside of just the Profile API.

nik9000 · 2017-03-20T19:00:40Z

We've talked about this on and off for a while.

If we do this I think it'd be easier if this were a thing for tasks in general rather than just searches. It might work like task status. It is a general thing but each request has to "opt in" to it. There should be a "standard" way to opt into it.

I think it'd be hard if we wanted to force these IDs to be unique because we don't have a good place for that.

I'm thinking of a task metadata url parameter which could be search using the list tasks API. Or something like that. @imotov, what do you think?

imotov · 2017-03-20T21:37:34Z

@nik9000 maybe we can somehow expose a whitelisted subset of headers from ThreadContext at the moment of the task creation. This way it would be possible to add stuff on the rest layer in a general way to all requests. Otherwise, each request you would want to "opt in" will have to add a place to "stash" the information you want to expose via task manager api.

nik9000 · 2017-03-20T21:57:06Z

Maybe! If we can get it at the rest layer that'd be cool.

jrubensteinsp · 2017-03-30T01:02:43Z

This is the single most important feature for our environment. We have users that will run multiple searches in a row, and some are quite large. The ability to tag their searches and cancel specific searches prior to the latest that is still running would be an incredible benefit.

lusid · 2017-03-30T01:36:56Z

+1! Ability to abort specific prior searches on demand would be huge. I would gladly manage UUIDs on our end and pass them up to just be appended to the task at search time if that means I could do it through a single REST call. The inability to abort ES tasks has been a problem we have had since 0.90.

daedalus28 · 2017-03-30T01:40:52Z

👍 This is critical for our application because we have long-running analytic reports that sometimes are canceled by users, but the es cluster keeps going until it's done - and takes down the cluster in the process because users might immediately queue up different reports now that they've "canceled" the previous one.

cilerler · 2017-03-30T11:46:46Z

Allowing to assign an ID (not necessary unique one) for a search and be able to cancel it when its needed is crucial for heavy usage scenarios. 💡 Not having it is causing queuing up and leads to search rejections and it literally ties our hand and becomes bottleneck in our operation. 😞 Please make this issue priority 🤗 Thanks in advance

dshishkov · 2017-03-30T13:28:13Z

It would be really useful to have some sort of control over canceling queries for my use cases too. Thanks for considering it.

Akrion · 2017-03-31T15:10:51Z

+1

imotov · 2017-03-31T15:46:37Z

@jrubensteinsp, @lusid, @daedalus28, @cilerler, @dshishkov, @Akrion it seems that you all work for the same company. We are trying to make sure that this feature covers a variety of use cases and it would be helpful for us to understand if you have multiple use cases for this feature at your company or all these comments are essentially about the same application. If you have multiple use cases, it would really help us if you could describe what they are and how they defer from each other?

cilerler · 2017-04-06T11:59:01Z

@imotov you are right, we all from the same company but we are accessing Elastic from different applications and we realized that we all suffering from the same issue. "Not having a capability to cancel a query".

Our customer facing application generates dynamic queries on the fly based on user interaction.
Our ETL process is automatized system which depends on other actions taken on other applications.

In a very simple way, common desired implementation would be

assigning a key on our end (no round trip) and be able to cancel related queries based on that key

Thank you for your time and attention!

evanvolgas · 2017-04-07T16:10:33Z

I would echo the need for being able to cancel long running queries. In MySQL land, a lot of times you'll have a daemon running pt-kill on the server https://www.percona.com/doc/percona-toolkit/2.1/pt-kill.html. One of the nicest features of pt-kill is it can kill queries that match a certain pattern while leaving others alone.

If you could associate a task id (especially one you have some control over assigning, or at least prefixing) with a search, it would be very straightforward to write a similar tool for ES -- one that looks for long running queries and, assuming the ID associated with them are identified as killable, kills them.

On hot-warm deployments of ES especially, this would be pretty helpful for us. Our warm nodes tend to have much more data on them than our hot ones and Kibana queries against those warm nodes occasionally knock the warm nodes offline (which in turn puts pressure on the masters in the form of recovery tasks, which in turns slows the whole cluster down). If we could stalk the running search tasks and kill anything that's taking too long and doesn't have a prefix on its id to mark it as non-killable, that'd really do a lot for our overall cluster stability. Of the last 10 times ES has required manual intervention, all of them had to do with inefficient queries running against our warms and knocking them offline. I suspect stalking and killing problem queries would have prevented all or most of those issues from happening.

TaskInfo is stored as a part of TaskResult and therefore can be read by nodes with an older version. If we add any additional information to TaskInfo (for #23250, for example), nodes with an older version should be able to ignore it, otherwise they will not be able to read TaskResults stored by newer nodes.

AndreKR · 2017-09-20T14:13:48Z

@imotov I'm not from that company but I also have a use case for this. :)
We are rendering images from Elasticsearch data and show them in the browser. Rendering an image, depending on the query parameters, can involve quite a few aggregations and it can take half a minute or so to gather the data.
When the user resizes the window or navigates to a different page, the browser cancels the HTTP request by closing the connection. We already propagate this through the load balancers to the render services, which will then cancel their work. However, currently it seems Elasticsearch will continue to work on those canceled queries.
It seems we could use the Task API to cancel the running search/aggregation, but only if we can find it's corresponding task ID.
In our code we already know which request we just canceled. If we could add an ID (could be a UUID, or in our case we might simply use a sequence number because only one application is using the cluster) and search for it using the Task API, then we could cancel the search by making a Task API request and save valuable CPU seconds/IOPs.

Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search). Closes elastic#23250

Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search). Closes #23250 Usage: ``` $ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents" { "tasks" : { "7qrTVbiDQKiZfubUP7DPkg:6998" : { "node" : "7qrTVbiDQKiZfubUP7DPkg", "id" : 6998, "type" : "transport", "action" : "cluster:monitor/tasks/lists", "start_time_in_millis" : 1513029940042, "running_time_in_nanos" : 266794, "cancellable" : false, "headers" : { "X-Opaque-Id" : "imotov" }, "children" : [ { "node" : "V-PuCjPhRp2ryuEsNw6V1g", "id" : 6088, "type" : "netty", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1513029940043, "running_time_in_nanos" : 67785, "cancellable" : false, "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998", "headers" : { "X-Opaque-Id" : "imotov" } }, { "node" : "7qrTVbiDQKiZfubUP7DPkg", "id" : 6999, "type" : "direct", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1513029940043, "running_time_in_nanos" : 98754, "cancellable" : false, "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998", "headers" : { "X-Opaque-Id" : "imotov" } } ] } } } ```

Persistent tasks portion of elastic#23250

Persistent tasks portion of #23250

eskibars added the :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. label Feb 19, 2017

clintongormley added discuss >enhancement labels Feb 20, 2017

clintongormley added help wanted adoptme and removed discuss labels Mar 31, 2017

imotov mentioned this issue Apr 12, 2017

Task Management: Make TaskInfo parsing forwards compatible #24073

Merged

imotov mentioned this issue Dec 11, 2017

Add ability to associate an ID with tasks #27764

Merged

imotov closed this as completed in #27764 Jan 12, 2018

martijnvg pushed a commit to martijnvg/elasticsearch that referenced this issue Jan 31, 2018

Add adding ability to associate an ID with tasks.

41071e4

Persistent tasks portion of elastic#23250

martijnvg pushed a commit that referenced this issue Feb 5, 2018

Add adding ability to associate an ID with tasks.

d879df1

Persistent tasks portion of #23250

lanerjo mentioned this issue Apr 4, 2019

Kibana should cancel previous requests before initiating new request from same session elastic/kibana#34569

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to associate a search task ID #23250

Ability to associate a search task ID #23250

eskibars commented Feb 19, 2017

evanvolgas commented Mar 15, 2017 •

edited

Loading

nik9000 commented Mar 20, 2017

imotov commented Mar 20, 2017

nik9000 commented Mar 20, 2017

jrubensteinsp commented Mar 30, 2017

lusid commented Mar 30, 2017 •

edited

Loading

daedalus28 commented Mar 30, 2017

cilerler commented Mar 30, 2017 •

edited

Loading

dshishkov commented Mar 30, 2017

Akrion commented Mar 31, 2017

imotov commented Mar 31, 2017

cilerler commented Apr 6, 2017

evanvolgas commented Apr 7, 2017 •

edited

Loading

AndreKR commented Sep 20, 2017

Ability to associate a search task ID #23250

Ability to associate a search task ID #23250

Comments

eskibars commented Feb 19, 2017

evanvolgas commented Mar 15, 2017 • edited Loading

nik9000 commented Mar 20, 2017

imotov commented Mar 20, 2017

nik9000 commented Mar 20, 2017

jrubensteinsp commented Mar 30, 2017

lusid commented Mar 30, 2017 • edited Loading

daedalus28 commented Mar 30, 2017

cilerler commented Mar 30, 2017 • edited Loading

dshishkov commented Mar 30, 2017

Akrion commented Mar 31, 2017

imotov commented Mar 31, 2017

cilerler commented Apr 6, 2017

evanvolgas commented Apr 7, 2017 • edited Loading

AndreKR commented Sep 20, 2017

evanvolgas commented Mar 15, 2017 •

edited

Loading

lusid commented Mar 30, 2017 •

edited

Loading

cilerler commented Mar 30, 2017 •

edited

Loading

evanvolgas commented Apr 7, 2017 •

edited

Loading