-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plan, statistics: maintain HistColl
in DataSource
's StatsInfo
#7385
Conversation
plan/stats.go
Outdated
@@ -27,6 +28,8 @@ type statsInfo struct { | |||
count float64 | |||
cardinality []float64 | |||
|
|||
histColl statistics.HistColl | |||
idx2Columns map[int][]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used?
statistics/table.go
Outdated
if rangePosition >= len(colIDs) { | ||
colID = -1 | ||
} else { | ||
colID = coll.Idx2ColumnIDs[idxID][rangePosition] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coll.Idx2ColumnIDs[idxID]
-> colIDs
.
statistics/table.go
Outdated
uniqueID, ok := colInfoID2UniqueID[id] | ||
// If this column is not in datasource's schema, it won't be used in this query. | ||
if ok { | ||
newColHistMap[uniqueID] = colHist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to adjust here, because the colID
is unique id, not column id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm... Let me think a while
@lamxTyler I realized that |
I prefer to put it in the |
@lamxTyler |
statistics/table.go
Outdated
} | ||
for id, colHist := range coll.Columns { | ||
uniqueID, ok := colInfoID2UniqueID[id] | ||
// If this column is not in datasource's schema, it won't be used in this query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to describe the beavior directlly, there is no need to explain why we don't collect the statistics of other columns:
// collect the statistics of the column in the schema of DataSource.
if len(ids) == 0 { | ||
continue | ||
} | ||
colID2IdxID[ids[0]] = idxHist.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the column is a prefix of more than one composite index, We only store the last index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this operation is the same with the original colName2Idx
map.
@winoros please merge master and resolve conflicts |
statistics/feedback.go
Outdated
return | ||
} | ||
rangeString = colRangeToStr(t.Columns[t.colName2ID[colName]], &rang, -1, factor) | ||
return | ||
} | ||
log.Debugf("%s index: %s, actual: %d, equality: %s, expected equality: %d, %s", prefix, idx.Info.Name.O, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the else
branch in line 852 returns the execution of this function. So this is a dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, its meaning changed after i changing the code. This log should be in the if branch now.
@lamxTyler PTAL |
statistics/table.go
Outdated
func (coll *HistColl) getIndexRowCount(sc *stmtctx.StatementContext, idx *Index, indexRanges []*ranger.Range, modifyCount int64) (float64, error) { | ||
// GenerateHistCollFromColumnInfo generates a new HistColl whose ColID2IdxID and IdxID2ColIDs is built from the given parameter. | ||
func (coll *HistColl) GenerateHistCollFromColumnInfo(infos []*model.ColumnInfo, columns []*expression.Column) HistColl { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@winoros PTAL this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm looking into the failure in test. |
…into hist-in-datasource
/run-all-tests |
expression/column.go
Outdated
func FindColumnsByUniqueIDs(cols []*Column, ids []int) []*Column { | ||
// FindColumnsByUniqueIDList will find columns by checking the unique id. | ||
// Then return order is the same with `ids`. If one id doesn't exists in `cols`. We'll just break. | ||
func FindColumnsByUniqueIDList(cols []*Column, ids []int64) []*Column { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
s/FindColumnsByUniqueIDList/FindPrefixOfIndexCols/
s/ids/indexColIDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think OfGivenCols
is better, since a more common name is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems that only when the second parameter is column ids of an index the prefix is returned. In other situation, all the matched columns should be returned?
By the way, the caller of this function now only pass the column ids of an index, except for the test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
Maintain
HistColl
inDataSource
'sStatsInfo
to get more accurate statistics.What is changed and how it works?
Map the column by its
UniqueID
rather thanID
. So it can be maintained in the other plan'sStatsInfo
later.And map the index and column by column's
UniqueID
rather than column's name.Check List
Tests
Code changes