executor: fill extra partition ID column in UnionScan executor #28666

tiancaiamao · 2021-10-08T14:41:09Z

What problem does this PR solve?

Issue Number: close #28073

Problem Summary:

What is changed and how it works?

Before this commit, the union scan executor will not fill the extra partition ID column for the chunk.
Then the extra PID column is 0, and the lock key is incorrect.
So some cases like #28073 go wrong.

A typically case is begin; insert into pt values (...); select * from pt for update,
the modified key in the transaction will not be locked correctly.

What's Changed:

Modify the UnionScan executor to support fill extra PID column.
Add more tests for the left join case

How it Works:

The extra PID column of the chunk data will be set correctly, so the SelectLock can use it to construct the lock key.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Fix bug of the incorrect lock key when using 'select for update' on partitioned tables inside a modified transaction

ti-chi-bot · 2021-10-08T14:41:11Z

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

cfzjywxk · 2021-10-11T13:17:04Z

executor/executor.go

@@ -931,6 +931,15 @@ func (e *SelectLockExec) Next(ctx context.Context, req *chunk.Chunk) error {
 					// The partition ID is returned as an extra column from the table reader.
 					if offset, ok := e.tblID2PIDColumnIndex[id]; ok {
 						physicalID = row.GetInt64(offset)
+
+						if physicalID == 0 {


There could be some unexpected errors if the physicalID is zero and this condition is a bit confusing. Do we have some other ways do check the left join result situation?

Yes, I agree it's confusing here. physicalID == 0 may be casue by left join, or maybe it's caused by bugs.

Distinguish those cases is unrealistic, because left join have several implementations: hash join / merge join / nest loop join / index join etc... and left join is one of the case we found (that will generate empty or null row), there might be other cases that fill empty row ... It's hard to find out all.

So ... let's look a step back.
In the past, we have bug for lock on partition ... (that's bad)
Then, we fix it ... #14921
Then, we find more bug (that's bad)
Then, we try to fix it ... #21148
And the the solution caused more serious problems and introduced more critical bugs ... (wow! worse)

After change, we come back from worse to bad, that's a big progress!
I mean, we fix some problems and make the solution (at least) not bad than before.

@qw4990
Do you have any ideas about this?

The thing I'm worried about is though the former bugs will make the query panic it will not have future impact on the data in storage. If we could not verify which is expected in some write statement, there could be some wrong data writting into storage, just like the issue listed above an invalid key is locked and the lock record is persisted.

Yes... I have the same worries that bugs break the data.
If there are some better ways to fix this problem, I'd like to choose that solution. But I can't come up with better ideas.
So we have to fix the current problem and add tests to cover more scenarios.

It's quite necessary to add more tests by now seems there could be more unknown issues. BTW do we have bandwidth for the coverage enhancement or our QA team?

Can we let all OuterJoins set this column to a specified value (e.g. -1) explicitly when mismatching?
Then we can define pid=0 as the uninitialized state and we know it must be caused by some bug;
And then we can return an error like pid is not uninitialized in this case.
We don't have to find all OuterJoins at once; We can find them by our best effort this time, and then just wait for the uninitialized error and fix them.

cfzjywxk · 2021-10-14T02:58:11Z

executor/partition_table_test.go

+	}()
+
+	// Give chance for the goroutines to run first.
+	time.Sleep(80 * time.Millisecond)


This may be unstable in the CI environment.

It should be stable.
The test wants to check 2, 3 is blocked by 1 ...
Here we give chance for 2 and 3 to run first, let 1 sleep for a while
Its purpose is to verify 2 and 3 is blocked and can't run, then we check the final order is
1 2 3 or 1 3 2 and we achieve the test goal: 2 3 is blocked by 1.

2, 3 blocked by 1 means the partition pessimistic lock works as expected, the partition key is constructed correctly.

ti-chi-bot · 2021-11-11T18:22:36Z

@tiancaiamao: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tiancaiamao · 2021-12-15T02:11:12Z

Fix in another way, see #30732

executor: fill extra partition ID column in UnionScan executor

ec2b347

tiancaiamao requested review from cfzjywxk, qw4990 and ichn-hu October 8, 2021 14:41

ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 8, 2021

cfzjywxk reviewed Oct 11, 2021

View reviewed changes

cfzjywxk reviewed Oct 14, 2021

View reviewed changes

mjonss mentioned this pull request Nov 1, 2021

runtime error: index out of range [5] with length 5 #28745

Closed

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 11, 2021

tiancaiamao mentioned this pull request Dec 15, 2021

*: fix 'select for update' on partitioned table again #30732

Closed

12 tasks

tiancaiamao closed this Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: fill extra partition ID column in UnionScan executor #28666

executor: fill extra partition ID column in UnionScan executor #28666

tiancaiamao commented Oct 8, 2021

ti-chi-bot commented Oct 8, 2021

cfzjywxk Oct 11, 2021

tiancaiamao Oct 13, 2021

cfzjywxk Oct 13, 2021

cfzjywxk Oct 13, 2021

tiancaiamao Oct 13, 2021

cfzjywxk Oct 14, 2021

qw4990 Oct 14, 2021

cfzjywxk Oct 14, 2021

tiancaiamao Oct 14, 2021

ti-chi-bot commented Nov 11, 2021

tiancaiamao commented Dec 15, 2021

executor: fill extra partition ID column in UnionScan executor #28666

executor: fill extra partition ID column in UnionScan executor #28666

Conversation

tiancaiamao commented Oct 8, 2021

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Oct 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ti-chi-bot commented Nov 11, 2021

tiancaiamao commented Dec 15, 2021