Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan, statistics: maintain HistColl in DataSource's StatsInfo #7385

Merged
merged 19 commits into from
Aug 30, 2018

Conversation

winoros
Copy link
Member

@winoros winoros commented Aug 14, 2018

What problem does this PR solve?

Maintain HistColl in DataSource's StatsInfo to get more accurate statistics.

What is changed and how it works?

Map the column by its UniqueID rather than ID. So it can be maintained in the other plan's StatsInfo later.
And map the index and column by column's UniqueID rather than column's name.

Check List

Tests

  • Exsiting tests.

Code changes

  • Has exported function/method change
  • Has exported variable/fields change

@winoros winoros added the sig/planner SIG: Planner label Aug 14, 2018
plan/stats.go Outdated
@@ -27,6 +28,8 @@ type statsInfo struct {
count float64
cardinality []float64

histColl statistics.HistColl
idx2Columns map[int][]int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used?

if rangePosition >= len(colIDs) {
colID = -1
} else {
colID = coll.Idx2ColumnIDs[idxID][rangePosition]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coll.Idx2ColumnIDs[idxID] -> colIDs.

uniqueID, ok := colInfoID2UniqueID[id]
// If this column is not in datasource's schema, it won't be used in this query.
if ok {
newColHistMap[uniqueID] = colHist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to adjust here, because the colID is unique id, not column id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm... Let me think a while

@winoros
Copy link
Member Author

winoros commented Aug 22, 2018

@lamxTyler I realized that statistics.Column.Info.ID is the one to load column stats so i use it in ColumnIsInvalid. But there comes a question that only DataSource can load column. But actually in other operator it may need the column that the DataSource don't need. Maybe we need add a TableID in ColumnInfo struct?
@zz-jason What do you think?

@alivxxx
Copy link
Contributor

alivxxx commented Aug 24, 2018

I prefer to put it in the statistics.Column.

@winoros
Copy link
Member Author

winoros commented Aug 24, 2018

@lamxTyler
Reasonable, I'll change it in a separate pr.
You can review this first.
@zz-jason PTAL

}
for id, colHist := range coll.Columns {
uniqueID, ok := colInfoID2UniqueID[id]
// If this column is not in datasource's schema, it won't be used in this query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to describe the beavior directlly, there is no need to explain why we don't collect the statistics of other columns:

// collect the statistics of the column in the schema of DataSource.

if len(ids) == 0 {
continue
}
colID2IdxID[ids[0]] = idxHist.ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the column is a prefix of more than one composite index, We only store the last index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this operation is the same with the original colName2Idx map.

@zz-jason
Copy link
Member

@winoros please merge master and resolve conflicts

return
}
rangeString = colRangeToStr(t.Columns[t.colName2ID[colName]], &rang, -1, factor)
return
}
log.Debugf("%s index: %s, actual: %d, equality: %s, expected equality: %d, %s", prefix, idx.Info.Name.O,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the else branch in line 852 returns the execution of this function. So this is a dead code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, its meaning changed after i changing the code. This log should be in the if branch now.

@winoros
Copy link
Member Author

winoros commented Aug 28, 2018

@lamxTyler PTAL

func (coll *HistColl) getIndexRowCount(sc *stmtctx.StatementContext, idx *Index, indexRanges []*ranger.Range, modifyCount int64) (float64, error) {
// GenerateHistCollFromColumnInfo generates a new HistColl whose ColID2IdxID and IdxID2ColIDs is built from the given parameter.
func (coll *HistColl) GenerateHistCollFromColumnInfo(infos []*model.ColumnInfo, columns []*expression.Column) HistColl {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this blank line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winoros PTAL this comment.

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

alivxxx
alivxxx previously approved these changes Aug 30, 2018
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx alivxxx added the status/LGT2 Indicates that a PR has LGTM 2. label Aug 30, 2018
@alivxxx
Copy link
Contributor

alivxxx commented Aug 30, 2018

/run-all-tests

zhexuany
zhexuany previously approved these changes Aug 30, 2018
Copy link
Contributor

@zhexuany zhexuany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winoros
Copy link
Member Author

winoros commented Aug 30, 2018

I'm looking into the failure in test.

@winoros winoros dismissed stale reviews from zhexuany and alivxxx via 57c331c August 30, 2018 12:00
@winoros
Copy link
Member Author

winoros commented Aug 30, 2018

/run-all-tests

func FindColumnsByUniqueIDs(cols []*Column, ids []int) []*Column {
// FindColumnsByUniqueIDList will find columns by checking the unique id.
// Then return order is the same with `ids`. If one id doesn't exists in `cols`. We'll just break.
func FindColumnsByUniqueIDList(cols []*Column, ids []int64) []*Column {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

  • s/FindColumnsByUniqueIDList/FindPrefixOfIndexCols/
  • s/ids/indexColIDs

Copy link
Member Author

@winoros winoros Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OfGivenCols is better, since a more common name is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it seems that only when the second parameter is column ids of an index the prefix is returned. In other situation, all the matched columns should be returned?

By the way, the caller of this function now only pass the column ids of an index, except for the test code.

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason merged commit 87c54b2 into pingcap:master Aug 30, 2018
@winoros winoros deleted the hist-in-datasource branch September 5, 2018 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants