-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: fill extra partition ID column in UnionScan executor #28666
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
@@ -931,6 +931,15 @@ func (e *SelectLockExec) Next(ctx context.Context, req *chunk.Chunk) error { | |||
// The partition ID is returned as an extra column from the table reader. | |||
if offset, ok := e.tblID2PIDColumnIndex[id]; ok { | |||
physicalID = row.GetInt64(offset) | |||
|
|||
if physicalID == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be some unexpected errors if the physicalID
is zero and this condition is a bit confusing. Do we have some other ways do check the left join result situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it's confusing here. physicalID == 0
may be casue by left join, or maybe it's caused by bugs.
Distinguish those cases is unrealistic, because left join have several implementations: hash join / merge join / nest loop join / index join etc... and left join is one of the case we found (that will generate empty or null row), there might be other cases that fill empty row ... It's hard to find out all.
So ... let's look a step back.
In the past, we have bug for lock on partition ... (that's bad)
Then, we fix it ... #14921
Then, we find more bug (that's bad)
Then, we try to fix it ... #21148
And the the solution caused more serious problems and introduced more critical bugs ... (wow! worse)
After change, we come back from worse to bad, that's a big progress!
I mean, we fix some problems and make the solution (at least) not bad than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qw4990
Do you have any ideas about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I'm worried about is though the former bugs will make the query panic it will not have future impact on the data in storage. If we could not verify which is expected in some write statement, there could be some wrong data writting into storage, just like the issue listed above an invalid key is locked and the lock record is persisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... I have the same worries that bugs break the data.
If there are some better ways to fix this problem, I'd like to choose that solution. But I can't come up with better ideas.
So we have to fix the current problem and add tests to cover more scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite necessary to add more tests by now seems there could be more unknown issues. BTW do we have bandwidth for the coverage enhancement or our QA team?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we let all OuterJoins
set this column to a specified value (e.g. -1) explicitly when mismatching?
Then we can define pid=0
as the uninitialized state and we know it must be caused by some bug;
And then we can return an error like pid is not uninitialized
in this case.
We don't have to find all OuterJoins
at once; We can find them by our best effort this time, and then just wait for the uninitialized
error and fix them.
}() | ||
|
||
// Give chance for the goroutines to run first. | ||
time.Sleep(80 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unstable in the CI environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be stable.
The test wants to check 2, 3 is blocked by 1 ...
Here we give chance for 2 and 3 to run first, let 1 sleep for a while
Its purpose is to verify 2 and 3 is blocked and can't run, then we check the final order is
1 2 3 or 1 3 2 and we achieve the test goal: 2 3 is blocked by 1.
2, 3 blocked by 1 means the partition pessimistic lock works as expected, the partition key is constructed correctly.
@tiancaiamao: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fix in another way, see #30732 |
What problem does this PR solve?
Issue Number: close #28073
Problem Summary:
What is changed and how it works?
Before this commit, the union scan executor will not fill the extra partition ID column for the chunk.
Then the extra PID column is 0, and the lock key is incorrect.
So some cases like #28073 go wrong.
A typically case is
begin; insert into pt values (...); select * from pt for update
,the modified key in the transaction will not be locked correctly.
What's Changed:
How it Works:
The extra PID column of the chunk data will be set correctly, so the SelectLock can use it to construct the lock key.
Check List
Tests
Side effects
Documentation
Release note