-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: refactor joinResultGenerator
to handle the unmatched outer records
#7288
Conversation
/run-all-tests |
joinResultGenerator
to handle the unmatched outer records
executor/join_result_generators.go
Outdated
// } | ||
// | ||
// NOTE: This interface is **not** thread-safe. | ||
type recordJoiner interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be joinRecorder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record
means data record, it should be recordJoiner
or tupleJoiner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record
have both verb and noun, and not used very often in executor package. What about rowJoiner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct name can be discussed later, we can firstly focus on the code logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the original name joinResultGenerator
and do the name refactor later?
Because there are too many places to change.
/run-unit-test |
executor/index_lookup_join.go
Outdated
@@ -60,7 +60,7 @@ type IndexLookUpJoin struct { | |||
joinResult *chunk.Chunk | |||
innerIter chunk.Iterator | |||
|
|||
resultGenerator joinResultGenerator | |||
resultGenerator recordJoiner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name needs to be updated accordingly?
executor/join_result_generators.go
Outdated
@@ -32,21 +32,51 @@ var ( | |||
_ joinResultGenerator = &innerJoinResultGenerator{} | |||
) | |||
|
|||
// joinResultGenerator is used to generate join results according the join type, see every implementor for detailed information. | |||
// joinResultGenerator is used to generate join results according the join type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joinResultGenerator is used to generate join results according to the join type.
executor/join_result_generators.go
Outdated
// onMissMatch operates on the unmatched outer row according to the join | ||
// type. An outer row can be considered miss matched if: | ||
// 1. it can not pass the filter on the outer table side. | ||
// 2. there does not exist an inner row with the same join key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can use "there is no inner row with the same join key." Just for your reference.
executor/join_result_generators.go
Outdated
// unmatched outer rows according to the current join type: | ||
// 1. 'SemiJoin': ignores the unmatched outer row. | ||
// 2. 'AntiSemiJoin': appends the unmatched outer row to the result buffer. | ||
// 3. 'LeftOuterSemiJoin': concates the unmatched outer row with 0 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concates -> concats
executor/join_result_generators.go
Outdated
// 1. 'SemiJoin': ignores the unmatched outer row. | ||
// 2. 'AntiSemiJoin': appends the unmatched outer row to the result buffer. | ||
// 3. 'LeftOuterSemiJoin': concates the unmatched outer row with 0 and | ||
// appended it to the result buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appended -> appends
executor/join_result_generators.go
Outdated
// 2. 'AntiSemiJoin': appends the unmatched outer row to the result buffer. | ||
// 3. 'LeftOuterSemiJoin': concates the unmatched outer row with 0 and | ||
// appended it to the result buffer. | ||
// 4. 'AntiLeftOuterSemiJoin': concates the unmatched outer row with 0 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concates -> concats
executor/join_result_generators.go
Outdated
// appended it to the result buffer. | ||
// 4. 'AntiLeftOuterSemiJoin': concates the unmatched outer row with 0 and | ||
// appended to the result buffer. | ||
// 5. 'LeftOuterJoin': concates the unmatched outer row with a row of NULLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concates -> concats
executor/join_result_generators.go
Outdated
// 4. 'AntiLeftOuterSemiJoin': concates the unmatched outer row with 0 and | ||
// appended to the result buffer. | ||
// 5. 'LeftOuterJoin': concates the unmatched outer row with a row of NULLs | ||
// and appended to the result buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appended -> appends it
executor/join_result_generators.go
Outdated
// appended to the result buffer. | ||
// 5. 'LeftOuterJoin': concates the unmatched outer row with a row of NULLs | ||
// and appended to the result buffer. | ||
// 6. 'RightOuterJoin': concates the unmatched outer row with a row of NULLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concates -> concats
executor/join_result_generators.go
Outdated
// 5. 'LeftOuterJoin': concates the unmatched outer row with a row of NULLs | ||
// and appended to the result buffer. | ||
// 6. 'RightOuterJoin': concates the unmatched outer row with a row of NULLs | ||
// and appended to the result buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appended -> appends it
executor/join_result_generators.go
Outdated
// tryToMatch tries to join an outer row with a batch of inner rows. When | ||
// 'inners.Len != 0' but all the joined rows are filtered, the outer row is | ||
// considered unmatched. Otherwise, the outer row is matched and some joined | ||
// rows is appended to `chk`. The size of `chk` is limited to MaxChunkSize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rows are appended to chk
. The size of chk
is limited to MaxChunkSize.
@CaitinChen Done, PTAL again, thanks! |
executor/join_result_generators.go
Outdated
@@ -60,21 +60,21 @@ type joinResultGenerator interface { | |||
// onMissMatch operates on the unmatched outer row according to the join | |||
// type. An outer row can be considered miss matched if: | |||
// 1. it can not pass the filter on the outer table side. | |||
// 2. there does not exist an inner row with the same join key. | |||
// 2. there is no inner row with the same join key.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete the extra period at the end of this sentence.
executor/join_result_generators.go
Outdated
if !(err == nil && matched) { | ||
return | ||
} | ||
// here we handle the matched outer. | ||
chk.AppendPartialRow(0, outer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think do the main work in defer function is too tricky.
How about just
defer func() {
if matched {
innters.ReachEnd()
}
}
And AppendPartialRow
before return true, nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, innters.ReachEnd()
should be called together when we have a matched inner row and do the join work. So I chose the defer
style to reduce the code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not use defer
to reducing code duplication.
It splits the execution flow, makes the code harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extract functions to reducing code duplication.
executor/join_result_generators.go
Outdated
chk.AppendPartialRow(0, outer) | ||
chk.AppendInt64(outer.Len(), 1) | ||
return nil | ||
inners.ReachEnd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
executor/join_result_generators.go
Outdated
chk.AppendPartialRow(0, outer) | ||
chk.AppendInt64(outer.Len(), 0) | ||
return nil | ||
inners.ReachEnd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@coocood @CaitinChen @winoros PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
What have you changed? (mandatory)
refactor the
joinResultGenerator
:onMissMatch
to handle the unmatched outer recordsemit
method totryToMatch
, which returns a boolean value to indicate whether the outer record can be joined with any inner record.This PR is a bugfix, we should not handle the unmatched outer records in the old
emit
function: this function only joins a batch of inner rows, not all of them. We only know that the outer records can not be joined with any inner record on a serials ofemit
function calls, and then decide whether the outer record is unmatched.After this PR, a typical function call to the
recordJoiner
is:What is the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
No
Does this PR affect tidb-ansible update? (mandatory)
No
Does this PR need to be added to the release notes? (mandatory)
Add a few positive/negative examples (optional)
An negative case, which is already added in the test:
Before this PR:
and the wrong result is:
This PR fixes the above issue: