-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to associate a search task ID #23250
Comments
I assume you are talking about something like generating a unique request id along the lines of https://blog.ryandlane.com/2014/12/11/using-lua-in-nginx-for-unique-request-ids-and-millisecond-times-in-logs/. If so, I am hugely in favor of this idea, especially if the search id were carried through to the slow query logs. If it were, that would be extremely helpful vis a vis efforts around improving slow query logging (eg #9172 and #12187 (comment)). It could also potentially lend itself to @PhaedrusTheGreek's idea of breaking down API response time (#21073) or even logging it outside of just the Profile API. |
We've talked about this on and off for a while. If we do this I think it'd be easier if this were a thing for tasks in general rather than just searches. It might work like task status. It is a general thing but each request has to "opt in" to it. There should be a "standard" way to opt into it. I think it'd be hard if we wanted to force these IDs to be unique because we don't have a good place for that. I'm thinking of a task metadata url parameter which could be search using the list tasks API. Or something like that. @imotov, what do you think? |
@nik9000 maybe we can somehow expose a whitelisted subset of headers from ThreadContext at the moment of the task creation. This way it would be possible to add stuff on the rest layer in a general way to all requests. Otherwise, each request you would want to "opt in" will have to add a place to "stash" the information you want to expose via task manager api. |
Maybe! If we can get it at the rest layer that'd be cool. |
This is the single most important feature for our environment. We have users that will run multiple searches in a row, and some are quite large. The ability to tag their searches and cancel specific searches prior to the latest that is still running would be an incredible benefit. |
+1! Ability to abort specific prior searches on demand would be huge. I would gladly manage UUIDs on our end and pass them up to just be appended to the task at search time if that means I could do it through a single REST call. The inability to abort ES tasks has been a problem we have had since 0.90. |
👍 This is critical for our application because we have long-running analytic reports that sometimes are canceled by users, but the es cluster keeps going until it's done - and takes down the cluster in the process because users might immediately queue up different reports now that they've "canceled" the previous one. |
Allowing to assign an ID (not necessary unique one) for a search and be able to cancel it when its needed is crucial for heavy usage scenarios. 💡 Not having it is causing queuing up and leads to search rejections and it literally ties our hand and becomes bottleneck in our operation. 😞 Please make this issue priority 🤗 Thanks in advance |
It would be really useful to have some sort of control over canceling queries for my use cases too. Thanks for considering it. |
+1 |
@jrubensteinsp, @lusid, @daedalus28, @cilerler, @dshishkov, @Akrion it seems that you all work for the same company. We are trying to make sure that this feature covers a variety of use cases and it would be helpful for us to understand if you have multiple use cases for this feature at your company or all these comments are essentially about the same application. If you have multiple use cases, it would really help us if you could describe what they are and how they defer from each other? |
@imotov you are right, we all from the same company but we are accessing Elastic from different applications and we realized that we all suffering from the same issue. "Not having a capability to cancel a query".
In a very simple way, common desired implementation would be
Thank you for your time and attention! |
I would echo the need for being able to cancel long running queries. In MySQL land, a lot of times you'll have a daemon running If you could associate a task id (especially one you have some control over assigning, or at least prefixing) with a search, it would be very straightforward to write a similar tool for ES -- one that looks for long running queries and, assuming the ID associated with them are identified as killable, kills them. On |
TaskInfo is stored as a part of TaskResult and therefore can be read by nodes with an older version. If we add any additional information to TaskInfo (for #23250, for example), nodes with an older version should be able to ignore it, otherwise they will not be able to read TaskResults stored by newer nodes.
TaskInfo is stored as a part of TaskResult and therefore can be read by nodes with an older version. If we add any additional information to TaskInfo (for #23250, for example), nodes with an older version should be able to ignore it, otherwise they will not be able to read TaskResults stored by newer nodes.
@imotov I'm not from that company but I also have a use case for this. :) |
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search). Closes elastic#23250
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search). Closes #23250 Usage: ``` $ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents" { "tasks" : { "7qrTVbiDQKiZfubUP7DPkg:6998" : { "node" : "7qrTVbiDQKiZfubUP7DPkg", "id" : 6998, "type" : "transport", "action" : "cluster:monitor/tasks/lists", "start_time_in_millis" : 1513029940042, "running_time_in_nanos" : 266794, "cancellable" : false, "headers" : { "X-Opaque-Id" : "imotov" }, "children" : [ { "node" : "V-PuCjPhRp2ryuEsNw6V1g", "id" : 6088, "type" : "netty", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1513029940043, "running_time_in_nanos" : 67785, "cancellable" : false, "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998", "headers" : { "X-Opaque-Id" : "imotov" } }, { "node" : "7qrTVbiDQKiZfubUP7DPkg", "id" : 6999, "type" : "direct", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1513029940043, "running_time_in_nanos" : 98754, "cancellable" : false, "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998", "headers" : { "X-Opaque-Id" : "imotov" } } ] } } } ```
Adds support for capturing the X-Opaque-Id header from a REST request and storing it's value in the tasks that this request started. It works for all user-initiated tasks (not only search). Closes #23250 Usage: ``` $ curl -H "X-Opaque-Id: imotov" -H "foo:bar" "localhost:9200/_tasks?pretty&group_by=parents" { "tasks" : { "7qrTVbiDQKiZfubUP7DPkg:6998" : { "node" : "7qrTVbiDQKiZfubUP7DPkg", "id" : 6998, "type" : "transport", "action" : "cluster:monitor/tasks/lists", "start_time_in_millis" : 1513029940042, "running_time_in_nanos" : 266794, "cancellable" : false, "headers" : { "X-Opaque-Id" : "imotov" }, "children" : [ { "node" : "V-PuCjPhRp2ryuEsNw6V1g", "id" : 6088, "type" : "netty", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1513029940043, "running_time_in_nanos" : 67785, "cancellable" : false, "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998", "headers" : { "X-Opaque-Id" : "imotov" } }, { "node" : "7qrTVbiDQKiZfubUP7DPkg", "id" : 6999, "type" : "direct", "action" : "cluster:monitor/tasks/lists[n]", "start_time_in_millis" : 1513029940043, "running_time_in_nanos" : 98754, "cancellable" : false, "parent_task_id" : "7qrTVbiDQKiZfubUP7DPkg:6998", "headers" : { "X-Opaque-Id" : "imotov" } } ] } } } ```
Persistent tasks portion of elastic#23250
Describe the feature:
When you fire off a search request to Elasticsearch, you're stuck waiting until the result comes back. Normally, that's very, very fast. But occasionally an egregious search/dataset can take a while to get through, so we added the ability to kill them through the task manager. That's great, but it's difficult to use from the UI that executes the search.
Consider:
How does the UI match up the search that was executed with the list of tasks that are in the system? The UI could try some heuristics to match up the search descriptions in the task manager to the original request, but it'd have to be done based upon heuristics and those heuristics become very difficult given Elasticsearch will have rewritten the query and that there could be multiple searches that match the heuristic.
It'd be nice if you could associate some ID with a request at search time and have that ID show up in the task manager. That way, when the UI executes the request, it could specify an ID it could reference later if it needs to kill the request.
The text was updated successfully, but these errors were encountered: