-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
domain: support GC runaway record #44784
domain: support GC runaway record #44784
Conversation
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
domain/domain.go
Outdated
} | ||
} | ||
gcRunawayRecords := func() { | ||
sql := "DELETE FROM mysql.tidb_runaway_queries WHERE time < CONVERT_TZ(%?, '+00:00', @@TIME_ZONE)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the runaway records should be kept for a period of time. And, btw, it is possible that there are too many outdated records, then delete all of them in a single txn can exceed tidb's txn max size, so I suggest manually select and delete the records in batch.
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
cc @nolouch |
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
/retest |
domain/domain.go
Outdated
func (do *Domain) runawayRecordFlushLoop() { | ||
defer util.Recover(metrics.LabelDomain, "runawayRecordFlushLoop", nil, false) | ||
|
||
quarantineRecordGCInterval := time.Second * 15 //time.Minute * 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And don't forget to update doc as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add necessary test cases to cover the new logic
domain/domain.go
Outdated
var leftRows [][]types.Datum | ||
for { | ||
sql := "" | ||
limit := int(variable.TTLScanBatchSize.Load()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not depend on variable of other feature, you can define a const instead
domain/domain.go
Outdated
tbCIStr := model.NewCIStr(tableName) | ||
tbl, err := do.InfoSchema().TableByName(systemSchemaCIStr, tbCIStr) | ||
if err != nil { | ||
logutil.BgLogger().Info("delete system table failed", zap.String("table", tableName), zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logutil.BgLogger().Info("delete system table failed", zap.String("table", tableName), zap.Error(err)) | |
logutil.BgLogger().Error("delete system table failed", zap.String("table", tableName), zap.Error(err)) |
domain/domain.go
Outdated
func (do *Domain) runawayRecordFlushLoop() { | ||
defer util.Recover(metrics.LabelDomain, "runawayRecordFlushLoop", nil, false) | ||
|
||
quarantineRecordGCInterval := time.Second * 15 //time.Minute * 15 | ||
runawayRecordGCInterval := time.Second * 15 //time.Hour * 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this small duration?
domain/domain.go
Outdated
@@ -1294,6 +1389,10 @@ func (do *Domain) runawayRecordFlushLoop() { | |||
// flush as soon as possible. | |||
if len(quarantineRecordCh) == 0 || len(quarantineRecords) >= flushThrehold { | |||
flushQuarantineRecords() | |||
if owner.IsOwner() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need gc here, don't get the point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because when the watch is updated, it may be a new identify, at this time, I think I can delete the old one immediately.
domain/domain.go
Outdated
runawayRecordFluashTimer.Reset(time.Second) | ||
} | ||
case <-runawayRecordGCTicker.C: | ||
if owner.IsOwner() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to move this check into gcSystemTable
to avoid this check everywhere
domain/domain.go
Outdated
} | ||
case <-runawayRecordGCTicker.C: | ||
if owner.IsOwner() { | ||
do.gcSystemTable("tidb_runaway_queries", "time", runawayRecordExpiredDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As gcSystemTable
can be very slow, to avoid blocking handling other events, better spawn a new routine to do this job.
domain/domain.go
Outdated
} | ||
|
||
for len(leftRows) > 0 { | ||
maxBatch := 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the previous select is already paged, why do paging again here?
/retest |
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
/test build |
/test unit-test |
@CabinfeverB: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test unit-test |
/retest |
@@ -1248,11 +1251,116 @@ func (do *Domain) SetOnClose(onClose func()) { | |||
do.onClose = onClose | |||
} | |||
|
|||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it to resource group package?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: glorv, nolouch, qw4990 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: ref #43691 close #44804
As RFC shown, tidb_runaway_quarantined_watch only keeps active quarantined queries. So we should support gc.
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.