-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: reorg codes for hashtable in HashJoinExec #11937
executor: reorg codes for hashtable in HashJoinExec #11937
Conversation
/run-all-tests |
a1df811
to
6cf30e5
Compare
Codecov Report
@@ Coverage Diff @@
## master #11937 +/- ##
===========================================
Coverage 81.4498% 81.4498%
===========================================
Files 444 444
Lines 95622 95622
===========================================
Hits 77884 77884
Misses 12238 12238
Partials 5500 5500 |
/run-all-tests |
87d6fb3
to
f980b13
Compare
a57388d
to
2b4808e
Compare
2b4808e
to
63a0c3f
Compare
executor/hash_table.go
Outdated
type rowHashMap struct { | ||
entryStore entryStore | ||
hashTable map[uint64]entryAddr | ||
length int | ||
} | ||
|
||
// newRowHashMap creates a new rowHashMap. | ||
func newRowHashMap() *rowHashMap { | ||
func newRowHashMapWithStatCount(statCount int) *rowHashMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statCount
-> initCap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current rowContainer suppose that we always use the inner table to build hash table and use outer row to probe.
Should we also handle the case that using the outer table to build the hash table?
executor/join.go
Outdated
|
||
h := fnv.New64() | ||
chkIdx := uint32(0) | ||
statCount := int(e.innerStatsCount / statCountDivisor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put line 518~524 into newHashRowContainer?
if err != nil { | ||
return errors.Trace(err) | ||
} | ||
if hasNull { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we record the rowIdx
of which hasNull
is true?
If we use the outer table to build hashtable, we need to know the null-value lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll adjust it in further PR to involve hashRowContainer
into index join.
executor/hash_table.go
Outdated
// in multiple goroutines while each goroutine should keep its own | ||
// h and buf. | ||
func (c *hashRowContainer) GetMatchedRows(probeRow chunk.Row, joinKeysTypes []*types.FieldType, keyColIdx []int, h hash.Hash64, buf []byte) (matched []chunk.Row, hasNull bool, err error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless line
executor/hash_table.go
Outdated
} | ||
innerPtrs := c.hashTable.Get(key) | ||
if len(innerPtrs) == 0 { | ||
hasNull = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we set hashNull
to true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means no inner rows matched by the outer row in hashRowContainer
so we need call onMissMatch
.
The same logic in old code is at https://github.com/pingcap/tidb/blob/304619a/executor/join.go#L417 and https://github.com/pingcap/tidb/blob/304619a/executor/join.go#L434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this by removing the ambiguous isNull
return value, and use the size of matched rows to determine whether to call onMissMatch
.
executor/hash_table.go
Outdated
matched = append(matched, matchedRow) | ||
} | ||
if len(matched) == 0 { // TODO(fengliyuan): add test case | ||
hasNull = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
executor/hash_table.go
Outdated
|
||
// matchJoinKey checks if join keys of buildRow and probeRow are logically equal. | ||
func (c *hashRowContainer) matchJoinKey(buildRow, probeRow chunk.Row, probeHCtx *hashContext) (ok bool, err error) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is useless.
executor/hash_table.go
Outdated
) | ||
|
||
const ( | ||
// estCountMaxFactor defines the factor of maxStatCount with maxChunkSize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What does maxStatCount mean?
- What does estCountMax mean?
executor/hash_table.go
Outdated
// Set this threshold to prevent innerEstCount being too large and causing a performance regression. | ||
estCountMaxFactor = 10 * 1024 | ||
|
||
// estCountMinFactor defines the factor of statCountMin with maxChunkSize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
executor/hash_table.go
Outdated
// Set this threshold to prevent innerEstCount being too large and causing a performance and memory regression. | ||
estCountMaxFactor = 10 * 1024 | ||
|
||
// estCountMinFactor defines the factor of statCountMin with maxChunkSize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ statCountMin/ estCountMin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
/run-all-tests |
What problem does this PR solve?
Part of #11607
It reorgs codes for the hash table and introduces
hashRowContainer
. So that In further pull request, I can implementspillOutToDisk
feature forhashRowContainer
What is changed and how it works?
HashJoinExec
maxEntrySliceLen
, it's currently too large(8k) for OLTP queries.rowHashMap
.rowHashMap
from the estimated row count for better performance.Check List
Tests
Code changes
Side effects
Related changes
Release note
Benchmark
old.txt
new.txt