-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runaway query control based on resource group #43691
Labels
type/enhancement
The issue or PR belongs to an enhancement.
Comments
This was referenced May 15, 2023
2 tasks
12 tasks
12 tasks
12 tasks
Connor1996
changed the title
Runaway task control
Runaway query control based on resource group
Jun 9, 2023
17 tasks
12 tasks
12 tasks
This was referenced Jun 16, 2023
12 tasks
12 tasks
12 tasks
16 tasks
12 tasks
12 tasks
13 tasks
13 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
Run-away queries are queries that consume more resources beyond user expectation. This could be caused by improper SQL statement, suboptimal plan.
Runaway query can impact overall performance if they are not managed properly. We need to manage run-away queries effectively. Long-running operations should be identified and aborted.
Currently, we already have the deadline mechanism pushed down to the TiKV layer that one coprocessor request would not execute in TiKV more than 60s by default. But a runaway query may not cost too much time on one single coprocessor request, thus the deadline mechanism can't help avoid run-away queries. In the meantime, deadlines can't be too small, otherwise, normal requests can be quickly aborted.
How to identify run-away queries?
Runaway queries can adversely impact overall performance if they are not managed properly. Resource manager can take action when a query exceeds more than a specified amount of elapsed time. The elasped time indicates the time of being processed, which excludes the waiting time.
Differentiating run-away queries from queries that really need to perform a full table/index scan is hard. There is no absolute rule. So we just let users define the rule to identify run-away queries. They can twist it on their own needs. The criteria are only the execution time, at least at present. Maybe add more dimension later.
TiKV would send back the scan detail in coprocessor responses. If the total elapsed time of the query exceeds the threshold, then it would be recognized as a run-away query(statement).
Task Breakdown
GetResourceGroup
functon tikv/pd#6515information_schema.resource_groups
ddl, I_S: support runaway attribute in resource group #43877override_priority
resource control: Add runaway settings kvproto#1114information_schema.runaway_queries
information_schema.qurantined_watch
domain: record runaway and quarantine query #44654query watch
stmt for manul management of runaway watch #45500query watch
#45465Misc
The text was updated successfully, but these errors were encountered: