-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ttl: support to split TTL tasks for a table without binary primary key #55663
Conversation
Hi @lcwangchao. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #55663 +/- ##
================================================
+ Coverage 72.9214% 74.7450% +1.8235%
================================================
Files 1581 1584 +3
Lines 442336 442645 +309
================================================
+ Hits 322558 330855 +8297
+ Misses 99970 91525 -8445
- Partials 19808 20265 +457
Flags with carried forward coverage won't be shown. Click here to find out more.
|
pkg/ttl/cache/table.go
Outdated
if mysql.HasBinaryFlag(ft.GetFlag()) { | ||
return t.splitCommonHandleRanges(ctx, tikvStore, splitCnt, false, false) | ||
} | ||
return t.splitCommonHandleRanges(ctx, tikvStore, splitCnt, false, false, mysql.HasBinaryFlag(ft.GetFlag())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it only accept utf8mb4_bin
(or other binary collations)?
For utf8mb4_unicode_ci
(or other UCA collations), there is no direct connection between the character and the weight. It's meaningless to get the ASCII prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the ASCII string a
has weight 0x0E33
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL again
9617653
to
8037809
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Do we need to update the doc, too? |
Yes, I'll update the doc after this PR merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bb7133, YangKeao The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold
|
8037809
to
c363ec2
Compare
/unhold |
c363ec2
to
47444dd
Compare
/retest |
@lcwangchao: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number: close #55660
Problem Summary:
When running a TTL job, the job will be split into several tasks to delete rows of concurrency. However, if a table's primary key is a string type without a binary flag, the task cannot be split. For example:
When table t starts a TTL job, it only has one task even if it is a large table with many regions.
The reason is that we split tasks according to the region edges in the current implementation. However, the region is a conception in TiKV which will not ensure it will have a legal encoding with the table record. If use the raw bytes decoded from the region edge as the SQL condition, it will report an error.
For example, for the above table
t
, the below SQL will report an error:To avoid this, we did not do split works for this type of table.
But in practice, using a string column as the primary key is very normal, such as to store uuid, phone number, etc... Though we can use binary charset to walk around it to make TTL split tasks, it is some what strange and not easy to use.
What changed and how does it work?
We support to split TTL tasks for tables with string columns without binary flag. In this PR, we only parse the visible ASCII prefix of each region edge bytes. For example, if a table has some regions:
3 tasks in TTL job will be created with primary ranges:
id < 'abc'
id >= 'abc' AND id < 'abd'
id > 'abd'
Though it still has some limitations such as it still cannot work well when most keys does not contain ASCII prefix or share the same ASCII prefix, it can fit most scenes.
Check List
Tests
tested in a real cluster:
insert several rows, we can see the TTL tasks are splitter to several tasks:
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.