-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: fix 'select for update' on partitioned table again #30732
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
cols := pt.VisibleCols() | ||
offsets := make([]int, 0, len(cols)) | ||
for _, colInfo := range cols { | ||
colName := fmt.Sprintf("%s.%s.%s", dbInfo.Name.L, tblInfo.Name.L, colInfo.Name.L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be some colume names not in this format like _tidb_rowid
that it could not be found in the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some curious questions :)
|
||
for _, pt := range v.PartitionedTable { | ||
tblInfo := pt.Meta() | ||
e.partitionedTable[tblInfo.ID] = pt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the tblInfo.ID
the logical table or the physical table (partition) ID here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tblInfo.ID
is the logical table ID here
|
||
cols := pt.VisibleCols() | ||
offsets := make([]int, 0, len(cols)) | ||
for _, colInfo := range cols { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create this double loop for each partition?
Are not all the partitions the same?
I wonder if we cannot use the HandleCols from tblID2Handle
instead.
I think I managed something like that in my hack here.
} | ||
return selectLock, nil | ||
} | ||
|
||
func addExtraPIDColumnToDataSource(p LogicalPlan, info *extraPIDInfo) error { | ||
func collectPartitionTable(p LogicalPlan, input []table.PartitionedTable) []table.PartitionedTable { | ||
switch raw := p.(type) { | ||
case *DataSource: | ||
// Fix issue 26250, do not add extra pid column to normal table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this comment.
e.tblID2PIDColumnIndex[tblID] = offset | ||
if len(v.PartitionedTable) > 0 { | ||
e.partitionedTable = make(map[int64]table.PartitionedTable) | ||
e.tblID2PtColsOffsets = make(map[int64][]int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? Is not the e.tblID2Handle
enough?
if err != nil { | ||
return err | ||
} | ||
input = collectPartitionTable(child, input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you still collect the partition table of the aggregation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, you don't need to collect the things after a subquery.
For the two following queries
select * from t, (select * from t1) t1 where t.c=t1.id for update;
select * from t, t1 where t.c=t1.id for update;
The first would not lock the t1, only the second one would.
Maybe you can refer to the handleHelper
of the PlanBuilder
. It maintains a stack to record the information. Though it does some redundant work. But it's easy to add new things to it I think.
// handleHelper records the handle column position for tables. Delete/Update/SelectLock/UnionScan may need this information.
// It collects the information by the following procedure:
// Since we build the plan tree from bottom to top, we maintain a stack to record the current handle information.
// If it's a dataSource/tableDual node, we create a new map.
// If it's an aggregation, we pop the map and push a nil map since no handle information left.
// If it's a union, we pop all children's and push a nil map.
// If it's a join, we pop its children out then merge them and push the new map to stack.
// If we meet a subquery, it's clear that it's an independent problem so we just pop one map out when we finish building the subquery.
handleHelper *handleColHelper
@tiancaiamao: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Use another strategy to fix the locking on partitioned table issue #31634 |
What problem does this PR solve?
Issue Number: close #30382 #28073
Problem Summary:
This is a rework of #21148, that old fix introduce a extra partition ID column to the schema.
In theory, the performance is better, but it's error-prone. After PR 21148 it has introduced numerous bugs.
For example, #28073 #28292 #30489 ... and maybe many more
If we use this one, #28666 is unnecessary. It's a fix on the old code, while this one is a totally rework.
What is changed and how it works?
Now I avoid using the extra partition ID column.
The problem of partition ID column is, many of the executor need to fill the extra partition ID column, and if we forget to do it, then there would be bugs:
extraPIDColumn
will be missingThere are so many places involved, it's hard to fix all of them.
In this PR, I just avoid column pruning on the SelectLock, and the using partition pruning again to get the partition ID.
For #30382, the select for update lock will not lock the table of the subquery, although the plan is transformed into Join, just one side is locked.
Check List
Tests
Side effects
Documentation
Release note