-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor/join : use shallow copy for join. #7433
Conversation
@crazycs520 Please rebase your commit first, then push it to github. |
below are pprof svg of columnbycolumn and fieldbyfield. |
Seems like the improvement from the cache locality is counteracted by the overhead of some extra cpu instructions. |
At present, Most of time is consume by the Another idea is don't copy, store |
util/chunk/chunk.go
Outdated
@@ -160,6 +160,77 @@ func (c *Chunk) AppendPartialRow(colIdx int, row Row) { | |||
} | |||
} | |||
|
|||
func (c *Chunk) AppendPartialRows(colIdx int, rowIt Iterator, maxLen int) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the exported method, you must comment. It is not optional.
util/chunk/chunk.go
Outdated
return c.columns[colIdx+0].length - oldRowLen | ||
} | ||
|
||
func (c *Chunk) AppendPartialSameRows(colIdx int, row Row, rowsLen int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same reason.
util/chunk/iterator.go
Outdated
@@ -32,6 +32,8 @@ type Iterator interface { | |||
// Next returns the next Row. | |||
Next() Row | |||
|
|||
PreRows(i int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for this method?
util/chunk/iterator.go
Outdated
@@ -75,6 +77,11 @@ func (it *iterator4Slice) Next() Row { | |||
return row | |||
} | |||
|
|||
// PreRows implements the Iterator interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment basically says nothing. You have to be more explicitly.
/rebuild |
/run-all-tests |
1 similar comment
/run-all-tests |
ab33e54
to
71c54a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change all the shadow
to shallow
@@ -125,6 +126,7 @@ type baseJoiner struct { | |||
defaultInner chunk.Row | |||
outerIsRight bool | |||
chk *chunk.Chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Inner join, left out join and right out join will use chk
to do deep copy. deep copy + vectorize filter + batch copy have better performance.
executor/joiner.go
Outdated
@@ -142,6 +144,16 @@ func (j *baseJoiner) makeJoinRowToChunk(chk *chunk.Chunk, lhs, rhs chunk.Row) { | |||
chk.AppendPartialRow(lhs.Len(), rhs) | |||
} | |||
|
|||
// makeJoinRow combines inner, outer row into shadowRow. | |||
// combines will uses shadow copy inner and outer row data to shadowRow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeJoinRow shallow copies `inner` and `outer` into `shallowRow`.
9102e19
to
0aadbf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change all the shadow
to shallow
rest LGTM
@zz-jason PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
util/chunk/mutrow.go
Outdated
for i, rowCol := range row.c.columns { | ||
chkCol := chk.columns[colIdx+i] | ||
if !rowCol.isNull(row.idx) { | ||
chkCol.nullBitmap[0] = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add some comments about why we set the whole byte to 1
/0
util/chunk/mutrow.go
Outdated
// ShallowCopyPartialRow shallow copies the data of `row` to MutRow. | ||
func (mr MutRow) ShallowCopyPartialRow(colIdx int, row Row) { | ||
chk := mr.c | ||
for i, rowCol := range row.c.columns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
s/rowCol/srcCol/
s/chkCol/dstCol/
util/chunk/mutrow.go
Outdated
|
||
// ShallowCopyPartialRow shallow copies the data of `row` to MutRow. | ||
func (mr MutRow) ShallowCopyPartialRow(colIdx int, row Row) { | ||
chk := mr.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chk
is only used once, I think we can remove this variable and use mr.c
directly.
executor/joiner.go
Outdated
@@ -142,6 +149,15 @@ func (j *baseJoiner) makeJoinRowToChunk(chk *chunk.Chunk, lhs, rhs chunk.Row) { | |||
chk.AppendPartialRow(lhs.Len(), rhs) | |||
} | |||
|
|||
// makeJoinRow shallow copies `inner` and `outer` into `shallowRow`. | |||
func (j *baseJoiner) makeJoinRow(isRightJoin bool, inner, outer chunk.Row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about s/makeJoinRow/makeShallowJoinRow/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -102,18 +101,25 @@ func newJoiner(ctx sessionctx.Context, joinType plan.JoinType, | |||
} | |||
switch joinType { | |||
case plan.SemiJoin: | |||
base.shallowRow = chunk.MutRowFromTypes(colTypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What not change it on line 94 directly? So we don't repeat it multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only SemiJoin, AntiSemiJoin, LeftOuterSemiJoin, AntiLeftOuterSemiJoin
need shallowRow
.
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
749e72b
to
1e8a9f0
Compare
What problem does this PR solve?
This is for #7420
In joiner.go, function
tryToMatch
is act like below:makeJoinRowToChunk
will copyinner
andouter
row to temporary chunk( j.chk ).There's no need to do deep copy, use shadow copy will be ok, Specifically,
chunk.column.data
use shadow copy will be ok.Another thing worth to say is that function call overhead in
makeJoinRowToChunk
is very expensive,Maybe more expensive than the actual memory cory. such ascolumn.isNull
,column.appendNullBitmap
. see here pprof svg:fieldbyfield.pdfThis PR will use shadow copy for
chunk.column.data
, and removecolumn.appendNullBitmap
funcation call.Below is a benchmarck:
As you can see, if have
t1.addr > t2.course
condition always false, the result ofcount(*) = 0
, temporary deep copy in old is spend more time.After remove
t1.addr > t2.course
condition , the result of new is still have a little advantage than old.If both join column is int , shadow copy will have no advantage, and maybe slow than old version.