Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor/join : use shallow copy for join. #7433

Merged
merged 49 commits into from
Aug 29, 2018
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
a576cc0
iterator copy init
crazycs520 Aug 19, 2018
79b74e7
fix iterator bug
crazycs520 Aug 19, 2018
d9fe98b
fix nullmap index out of range and add test
crazycs520 Aug 19, 2018
8fa2ad8
refine code
crazycs520 Aug 19, 2018
33ae5a0
refine code
crazycs520 Aug 19, 2018
255c64b
remove iterator copy and use back to pre rows
crazycs520 Aug 19, 2018
0973373
checkout joiner.go file
crazycs520 Aug 19, 2018
055a41a
add check to bench
crazycs520 Aug 20, 2018
c5eeff9
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 20, 2018
e7c6c64
iterator only once
crazycs520 Aug 21, 2018
ce8eb4f
add appendMultiSameNullBitmap
crazycs520 Aug 21, 2018
499b6f9
field by field only one line 2X
crazycs520 Aug 22, 2018
2a295a1
refine column copy
crazycs520 Aug 22, 2018
9d82447
refine column copy
crazycs520 Aug 22, 2018
ef85948
add shadow copy to join and move code
crazycs520 Aug 22, 2018
45b4631
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 22, 2018
5bf279f
rename function
crazycs520 Aug 22, 2018
9690506
add comment
crazycs520 Aug 22, 2018
8db639f
add shadow copy to inner join
crazycs520 Aug 22, 2018
7a55ff5
refine code
crazycs520 Aug 22, 2018
05c1273
add shadow copy to all join
crazycs520 Aug 22, 2018
66a133c
remove redundancy code
crazycs520 Aug 22, 2018
dadb047
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 22, 2018
b4192e4
remove column copy and redundancy code
crazycs520 Aug 23, 2018
4096997
address comment
crazycs520 Aug 23, 2018
b802941
add mutchunk
crazycs520 Aug 23, 2018
c5cfdf1
address comment
crazycs520 Aug 23, 2018
947f9d4
use mutRow instead of mut chunk.
crazycs520 Aug 23, 2018
24ab90e
address comment
crazycs520 Aug 23, 2018
2b8d896
refine code
crazycs520 Aug 23, 2018
3f82d2b
address comment
crazycs520 Aug 23, 2018
604e49d
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 23, 2018
e5f4cbe
address comment
crazycs520 Aug 23, 2018
abbc2c9
address comment
crazycs520 Aug 24, 2018
e1dd31d
address comment and add test to mutRow_test
crazycs520 Aug 24, 2018
593b31c
remove chunk_copy_test.go
crazycs520 Aug 24, 2018
3a6fbb7
refine code
crazycs520 Aug 24, 2018
600fdc3
refine test
crazycs520 Aug 24, 2018
f4fbd70
refine test and code
crazycs520 Aug 24, 2018
23eaf1e
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 24, 2018
e9ef7dd
optimize append num
crazycs520 Aug 24, 2018
21b5417
remove shadown copy on inner, leftOut, rightOut join, vectorized filt…
crazycs520 Aug 27, 2018
0aadbf6
address comment
crazycs520 Aug 27, 2018
0de2063
address comment
crazycs520 Aug 27, 2018
c681658
address comment
crazycs520 Aug 28, 2018
f8ccdf2
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 28, 2018
c7b2301
Merge branch 'master' of https://github.com/pingcap/tidb into column-…
crazycs520 Aug 29, 2018
1e8a9f0
update test after merge
crazycs520 Aug 29, 2018
b939b3b
Merge branch 'master' into column-copy
XuHuaiyu Aug 29, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 43 additions & 77 deletions executor/joiner.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ func newJoiner(ctx sessionctx.Context, joinType plan.JoinType,
colTypes := make([]*types.FieldType, 0, len(lhsColTypes)+len(rhsColTypes))
colTypes = append(colTypes, lhsColTypes...)
colTypes = append(colTypes, rhsColTypes...)
base.chk = chunk.NewChunkWithCapacity(colTypes, ctx.GetSessionVars().MaxChunkSize)
base.mutRow = chunk.MutRowFromTypes(colTypes)
base.selected = make([]bool, 0, chunk.InitialCapacity)
if joinType == plan.LeftOuterJoin || joinType == plan.RightOuterJoin {
innerColTypes := lhsColTypes
Expand Down Expand Up @@ -124,7 +124,7 @@ type baseJoiner struct {
conditions []expression.Expression
defaultInner chunk.Row
outerIsRight bool
chk *chunk.Chunk
mutRow chunk.MutRow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/mutRow/shadowRow/?

selected []bool
maxChunkSize int
}
Expand All @@ -142,6 +142,16 @@ func (j *baseJoiner) makeJoinRowToChunk(chk *chunk.Chunk, lhs, rhs chunk.Row) {
chk.AppendPartialRow(lhs.Len(), rhs)
}

// makeJoinRow combines inner, outer row into mutRow.
// combines will uses shadow copy inner and outer row data to mutRow.
func (j *baseJoiner) makeJoinRow(isRightJoin bool, inner, outer chunk.Row) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/makeJoinRow/makeShallowJoinRow/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if !isRightJoin {
inner, outer = outer, inner
}
j.mutRow.ShadowCopyPartialRow(0, inner)
j.mutRow.ShadowCopyPartialRow(inner.Len(), outer)
}

func (j *baseJoiner) filter(input, output *chunk.Chunk) (matched bool, err error) {
j.selected, err = expression.VectorizedFilter(j.ctx, j.conditions, chunk.NewIterator4Chunk(input), j.selected)
if err != nil {
Expand Down Expand Up @@ -173,14 +183,9 @@ func (j *semiJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, chk *chu
}

for inner := inners.Current(); inner != inners.End(); inner = inners.Next() {
j.chk.Reset()
if j.outerIsRight {
j.makeJoinRowToChunk(j.chk, inner, outer)
} else {
j.makeJoinRowToChunk(j.chk, outer, inner)
}
j.makeJoinRow(j.outerIsRight, inner, outer)

matched, err = expression.EvalBool(j.ctx, j.conditions, j.chk.GetRow(0))
matched, err = expression.EvalBool(j.ctx, j.conditions, j.mutRow.ToRow())
if err != nil {
return false, errors.Trace(err)
}
Expand Down Expand Up @@ -212,14 +217,9 @@ func (j *antiSemiJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, chk
}

for inner := inners.Current(); inner != inners.End(); inner = inners.Next() {
j.chk.Reset()
if j.outerIsRight {
j.makeJoinRowToChunk(j.chk, inner, outer)
} else {
j.makeJoinRowToChunk(j.chk, outer, inner)
}
j.makeJoinRow(j.outerIsRight, inner, outer)

matched, err = expression.EvalBool(j.ctx, j.conditions, j.chk.GetRow(0))
matched, err = expression.EvalBool(j.ctx, j.conditions, j.mutRow.ToRow())
if err != nil {
return false, errors.Trace(err)
}
Expand Down Expand Up @@ -252,10 +252,9 @@ func (j *leftOuterSemiJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator,
}

for inner := inners.Current(); inner != inners.End(); inner = inners.Next() {
j.chk.Reset()
j.makeJoinRowToChunk(j.chk, outer, inner)
j.makeJoinRow(false, inner, outer)

matched, err = expression.EvalBool(j.ctx, j.conditions, j.chk.GetRow(0))
matched, err = expression.EvalBool(j.ctx, j.conditions, j.mutRow.ToRow())
if err != nil {
return false, errors.Trace(err)
}
Expand Down Expand Up @@ -295,10 +294,9 @@ func (j *antiLeftOuterSemiJoiner) tryToMatch(outer chunk.Row, inners chunk.Itera
}

for inner := inners.Current(); inner != inners.End(); inner = inners.Next() {
j.chk.Reset()
j.makeJoinRowToChunk(j.chk, outer, inner)
matched, err := expression.EvalBool(j.ctx, j.conditions, j.chk.GetRow(0))
j.makeJoinRow(false, inner, outer)

matched, err := expression.EvalBool(j.ctx, j.conditions, j.mutRow.ToRow())
if err != nil {
return false, errors.Trace(err)
}
Expand Down Expand Up @@ -330,25 +328,7 @@ func (j *leftOuterJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, chk
if inners.Len() == 0 {
return false, nil
}

j.chk.Reset()
chkForJoin := j.chk
if len(j.conditions) == 0 {
chkForJoin = chk
}

numToAppend := j.maxChunkSize - chk.NumRows()
for ; inners.Current() != inners.End() && numToAppend > 0; numToAppend-- {
j.makeJoinRowToChunk(chkForJoin, outer, inners.Current())
inners.Next()
}
if len(j.conditions) == 0 {
return true, nil
}

// reach here, chkForJoin is j.chk
matched, err := j.filter(chkForJoin, chk)
return matched, errors.Trace(err)
return j.tryToMatchInnerAndOuter(false, outer, inners, chk)
}

func (j *leftOuterJoiner) onMissMatch(outer chunk.Row, chk *chunk.Chunk) {
Expand All @@ -366,24 +346,7 @@ func (j *rightOuterJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, ch
return false, nil
}

j.chk.Reset()
chkForJoin := j.chk
if len(j.conditions) == 0 {
chkForJoin = chk
}

numToAppend := j.maxChunkSize - chk.NumRows()
for ; inners.Current() != inners.End() && numToAppend > 0; numToAppend-- {
j.makeJoinRowToChunk(chkForJoin, inners.Current(), outer)
inners.Next()
}
if len(j.conditions) == 0 {
return true, nil
}

// reach here, chkForJoin is j.chk
matched, err := j.filter(chkForJoin, chk)
return matched, errors.Trace(err)
return j.tryToMatchInnerAndOuter(true, outer, inners, chk)
}

func (j *rightOuterJoiner) onMissMatch(outer chunk.Row, chk *chunk.Chunk) {
Expand All @@ -400,26 +363,29 @@ func (j *innerJoiner) tryToMatch(outer chunk.Row, inners chunk.Iterator, chk *ch
if inners.Len() == 0 {
return false, nil
}
j.chk.Reset()
chkForJoin := j.chk
if len(j.conditions) == 0 {
chkForJoin = chk
}
inner, numToAppend := inners.Current(), j.maxChunkSize-chk.NumRows()
for ; inner != inners.End() && numToAppend > 0; inner, numToAppend = inners.Next(), numToAppend-1 {
if j.outerIsRight {
j.makeJoinRowToChunk(chkForJoin, inner, outer)
} else {
j.makeJoinRowToChunk(chkForJoin, outer, inner)

return j.tryToMatchInnerAndOuter(j.outerIsRight, outer, inners, chk)
}

// tryToMatchInnerAndOuter does 2 things:
// 1. Combine outer and inner row to join row.
// 2. Check whether the join row matches the join conditions, if so, append it to the `outChk`.
func (j *baseJoiner) tryToMatchInnerAndOuter(isRight bool, outer chunk.Row, inners chunk.Iterator, outChk *chunk.Chunk) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the execution logic of this function can be split to 2 stages:

  1. the first stage joins each inner row with the only outer row, computes the filter result of the joined row, stores the result matched into a bool slice. With this bool slice we can know whether an inner row can be joined with the only outer row.
  2. the second stage does the real deep copy work: according to the bool slice from stage 1, deep-copy the outer and matched inner row into the destination Chunk column by column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we also can use this pattern in selectExec and other match then output execs as well if it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, Nice catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use an individual issue to trace and record the enhancement of the cache locality in the execution engine. But it's not easy to find the potential optimization opportunities without reading and analyzing the code. It may take us a long time to finish it.

match := false
numToAppend := j.maxChunkSize - outChk.NumRows()
for inner := inners.Current(); inner != inners.End() && numToAppend > 0; inner, numToAppend = inners.Next(), numToAppend-1 {
j.makeJoinRow(isRight, inner, outer)

matched, err := expression.VectorizedFilterRow(j.ctx, j.conditions, j.mutRow.ToRow())
if err != nil {
return false, errors.Trace(err)
}
if matched {
match = true
outChk.AppendRow(j.mutRow.ToRow())
}
}
if len(j.conditions) == 0 {
return true, nil
}

// reach here, chkForJoin is j.chk
matched, err := j.filter(chkForJoin, chk)
return matched, errors.Trace(err)
return match, nil
}

func (j *innerJoiner) onMissMatch(outer chunk.Row, chk *chunk.Chunk) {
Expand Down
48 changes: 28 additions & 20 deletions expression/chunk_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -236,29 +236,37 @@ func VectorizedFilter(ctx sessionctx.Context, filters []Expression, iterator *ch
for i, numRows := 0, iterator.Len(); i < numRows; i++ {
selected = append(selected, true)
}
for _, filter := range filters {
isIntType := true
if filter.GetType().EvalType() != types.ETInt {
isIntType = false
var err error
for row := iterator.Begin(); row != iterator.End(); row = iterator.Next() {
if !selected[row.Idx()] {
Copy link
Contributor

@alivxxx alivxxx Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed because it must be false now. And why change this function? It is not vectorized anymore.

continue
}
selected[row.Idx()], err = VectorizedFilterRow(ctx, filters, row)
if err != nil {
return nil, errors.Trace(err)
}
for row := iterator.Begin(); row != iterator.End(); row = iterator.Next() {
if !selected[row.Idx()] {
continue
}
return selected, nil
}

// VectorizedFilterRow applies a list of filters to a row.
func VectorizedFilterRow(ctx sessionctx.Context, filters []Expression, row chunk.Row) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call it Vectorized?

selected := true
for _, filter := range filters {
isTypeInt := filter.GetType().EvalType() == types.ETInt
if isTypeInt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can line 269~284 be extracted as a function?
Thus VectorizedFilter can reuse it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if we reuse line 269~284 in the function VectorizedFilter, it's execution logic is not vectorized any more.

Copy link
Member

@zz-jason zz-jason Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wen can extract line 271~285 to a function named evaluateOneFilter() and reuse this function in both VectorizedFilter and FilterRow.

filterResult, isNull, err := filter.EvalInt(ctx, row)
if err != nil {
return false, errors.Trace(err)
}
if isIntType {
filterResult, isNull, err := filter.EvalInt(ctx, row)
if err != nil {
return nil, errors.Trace(err)
}
selected[row.Idx()] = selected[row.Idx()] && !isNull && (filterResult != 0)
} else {
// TODO: should rewrite the filter to `cast(expr as SIGNED) != 0` and always use `EvalInt`.
bVal, err := EvalBool(ctx, []Expression{filter}, row)
if err != nil {
return nil, errors.Trace(err)
}
selected[row.Idx()] = selected[row.Idx()] && bVal
selected = selected && !isNull && (filterResult != 0)
} else {
// TODO: should rewrite the filter to `cast(expr as SIGNED) != 0` and always use `EvalInt`.
bVal, err := EvalBool(ctx, []Expression{filter}, row)
if err != nil {
return false, errors.Trace(err)
}
selected = selected && bVal
}
}
return selected, nil
Expand Down
Loading