-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: add batch copy to inner join, left and right outer join. #7493
Conversation
/run-all-tests |
6c30d15
to
39cde9c
Compare
/run-all-tests |
/run-common-test |
/run-common-test |
/run-all-tests |
executor/joiner.go
Outdated
@@ -158,19 +158,12 @@ func (j *baseJoiner) makeShallowJoinRow(isRightJoin bool, inner, outer chunk.Row | |||
j.shallowRow.ShallowCopyPartialRow(inner.Len(), outer) | |||
} | |||
|
|||
func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) { | |||
func (j *baseJoiner) filter(input, output *chunk.Chunk) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make j.selected
as a return value for this function? The function caller can directly use the returned []bool
, no need to dig into this function to know that the result is stored in j.selected
.
util/chunk/chunk.go
Outdated
@@ -232,6 +232,77 @@ func (c *Chunk) AppendPartialRow(colIdx int, row Row) { | |||
} | |||
} | |||
|
|||
// BatchCopyJoinRowToChunk uses for join to batch copy inner rows and outer row to chunk. | |||
func (c *Chunk) BatchCopyJoinRowToChunk(isRight bool, chkForJoin *Chunk, outer Row, selected []bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this function be simplified to
func CopySelectedRows(src *Chunk, selected []bool, dst *Chunk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we cann't ...
There is a special optimizer for copy outer row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, chkForJoin
shares the same schema with c
029afe3
to
0cfa549
Compare
0cfa549
to
37902a2
Compare
util/chunk/chunk_util.go
Outdated
} | ||
|
||
// appendInnerRows appends different inner rows to the chunk. | ||
func appendInnerRows(innerColOffset, outerColOffset int, src *Chunk, selected []bool, dst *Chunk) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
s/appendInnerRows/copySelectedInnerRows/
s/columns/srcCols/
s/rowCol/srcCol/
s/chkCol/dstCol/
util/chunk/chunk_util.go
Outdated
return false | ||
} | ||
|
||
selectedRowNum := appendInnerRows(innerColOffset, outerColOffset, src, selected, dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about s/selectedRowNum/numSelected/
?
util/chunk/chunk_util.go
Outdated
chkCol.appendNullBitmap(!rowCol.isNull(i)) | ||
chkCol.length++ | ||
|
||
if rowCol.isFixed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move this branch predicate outside the for
loop.
569b86d
to
c4556b4
Compare
util/chunk/chunk_util.go
Outdated
start, end := srcCol.offsets[row.idx], srcCol.offsets[row.idx+numRows] | ||
dstCol.data = append(dstCol.data, srcCol.data[start:end]...) | ||
offsets := dstCol.offsets | ||
l := srcCol.offsets[row.idx+1] - srcCol.offsets[row.idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about `s/l/elemLen/?
util/chunk/chunk_util.go
Outdated
return dst.columns[innerColOffset].length - oldLen | ||
} | ||
|
||
// copyOuterRows appends same outer row to the chunk with `numRows` times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
// copyOuterRows copies the continuous 'numRows' outer rows in the source Chunk
// to the destination Chunk.
util/chunk/chunk_util.go
Outdated
return numSelected > 0 | ||
} | ||
|
||
// copySelectedInnerRows appends different inner rows to the chunk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
// copySelectedInnerRows copies the selected inner rows from the source Chunk
// to the destination Chunk.
util/chunk/chunk_util.go
Outdated
|
||
package chunk | ||
|
||
// CopySelectedJoinRows uses for join to batch copy inner rows and outer row to chunk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
// CopySelectedJoinRows copies the selected joined rows from the source Chunk
// to the destination Chunk.
//
// NOTE: All the outer rows in the source Chunk should be the same.
As the file and function name already have the join
key word, I think it's not necessary reiterate that this function is only used for the join operator.
util/chunk/chunk_util_test.go
Outdated
return srcChk, dstChk, selected | ||
} | ||
|
||
func TestBatchCopyJoinRowToChunk(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/TestBatchCopyJoinRowToChunk/TestCopySelectedJoinRows/
util/chunk/chunk_util_test.go
Outdated
} | ||
} | ||
|
||
func BenchmarkChunkBatchCopyJoinRow(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benchmark name needs to be updated.
util/chunk/chunk_util_test.go
Outdated
} | ||
} | ||
|
||
func BenchmarkChunkAppendRow(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about BenchmarkAppendSelectedRow
?
util/chunk/column.go
Outdated
@@ -85,6 +85,34 @@ func (c *column) appendNullBitmap(on bool) { | |||
} | |||
} | |||
|
|||
func (c *column) appendMultiSameNullBitmap(on bool, num int) { | |||
l := ((c.length + num - 1) >> 3) - len(c.nullBitmap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be:
l := ((c.length + num + 7) >> 3) - len(c.nullBitmap)
how about:
s/l/numNewBytes/
util/chunk/column.go
Outdated
func (c *column) appendMultiSameNullBitmap(on bool, num int) { | ||
l := ((c.length + num - 1) >> 3) - len(c.nullBitmap) | ||
for i := 0; i <= l; i++ { | ||
c.nullBitmap = append(c.nullBitmap, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's much clear and easier to understand if we change the copy strategy to:
- set all the higher
x
bits ofc.nullBitmap[len(c.nullBitmap)-1]
to0
or1
according to the value ofon
. - memset the new bytes to
0xFF
or0x00
according to the value ofon
.
util/chunk/column.go
Outdated
} | ||
} | ||
} else { | ||
c.nullCount += num |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to set all the existing bits to zero even if on
is set to false, because this Chunk maybe is truncated, and the null bitmap is not reset in that scenario.
@XuHuaiyu PTAL |
executor/joiner.go
Outdated
} | ||
matched = true | ||
output.AppendRow(input.GetRow(i)) | ||
// batch copy selected row to output chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Batch copies selected rows to output chunk.
- Add a
.
at the end of this comment.
util/chunk/chunk_util.go
Outdated
|
||
// copySelectedInnerRows copies the selected inner rows from the source Chunk | ||
// to the destination Chunk. | ||
func copySelectedInnerRows(innerColOffset, outerColOffset int, src *Chunk, selected []bool, dst *Chunk) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment for the return value.
start, end := srcCol.offsets[row.idx], srcCol.offsets[row.idx+numRows] | ||
dstCol.data = append(dstCol.data, srcCol.data[start:end]...) | ||
offsets := dstCol.offsets | ||
elemLen := srcCol.offsets[row.idx+1] - srcCol.offsets[row.idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be possible that row.Idx+1
out of range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the normal case, it will not out of range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be possible that src.NumRows
equals 1, if so, it seems this will be out of range? @crazycs520
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if src.NumRows
equals 1, the lengh of offsets
should be 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
util/chunk/column.go
Outdated
@@ -85,6 +85,29 @@ func (c *column) appendNullBitmap(on bool) { | |||
} | |||
} | |||
|
|||
func (c *column) appendMultiSameNullBitmap(on bool, num int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ on/ notNull ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see appendNullBitmap
also use on
, so, both use on
or notNull
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to both use on
.
util/chunk/column.go
Outdated
@@ -85,6 +85,29 @@ func (c *column) appendNullBitmap(on bool) { | |||
} | |||
} | |||
|
|||
func (c *column) appendMultiSameNullBitmap(on bool, num int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment for this function and its parameters.
util/chunk/column.go
Outdated
for i := 0; i < numNewBytes; i++ { | ||
c.nullBitmap = append(c.nullBitmap, b) | ||
} | ||
if on { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !on{
c.nulCount += num
return
}
xxxx
This can help avoiding the else.
eea2182
to
301a0db
Compare
/run-all-tests |
301a0db
to
c5aab73
Compare
c5aab73
to
026be7f
Compare
util/chunk/column.go
Outdated
pos := uint(c.length) & 7 | ||
c.nullBitmap[idx] |= byte(1 << pos) | ||
} else { | ||
c.nullCount++ | ||
} | ||
} | ||
|
||
func (c *column) appendMultiSameNullBitmap(on bool, num int) { | ||
// appendMultiSameNullBitmap appends multiple same bit value to `nullBitMap`. | ||
// notNull mean not not null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ mean/ means
remove redundant not
util/chunk/column.go
Outdated
func (c *column) appendMultiSameNullBitmap(on bool, num int) { | ||
// appendMultiSameNullBitmap appends multiple same bit value to `nullBitMap`. | ||
// notNull mean not not null. | ||
// num mean appends `num` bit value to `nullBitMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num means the number of bits that should be appended.
util/chunk/column.go
Outdated
} | ||
// 1. Set all the higher 8-'numOldBits' bits in the last old byte to 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Set all the remained bits in the last slot of old
c.numBitMap
to 1. - Set all the redundant bits in the last slot of new
c.numBitMap
to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, I ask @CaitinChen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crazycs520 I prefer "the remaining bits"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
@XuHuaiyu PTAL |
…into only-batch-copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
Use vectorize filter and batch copy.
here is benchmark of batch copy VS copy one by one:
//on my Mac, only TiDB , chunk_size=1024
//on server, TiDB + PD + TiKV , chunk_size=1024