-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infoschema: fill data length fields for tables #7657
Conversation
util/chunk/codec.go
Outdated
// varElemLen indicates this column is a variable length column. | ||
const varElemLen = -1 | ||
|
||
func getFixedLen(colType *types.FieldType) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the length in Chunk, but the length in the storage is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we do not maintain the length for fixed length column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we should use a different function to estimate the average column size in the storage.
types/field_type.go
Outdated
const VarElemLen = -1 | ||
|
||
// Length is the length of value for the type. | ||
func (ft *FieldType) Length() int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's misleading because we already have a Flen field in FieldType.
infoschema/tables.go
Outdated
@@ -720,6 +776,10 @@ func dataForTables(ctx sessionctx.Context, schemas []*model.DBInfo) ([][]types.D | |||
if err != nil { | |||
return nil, errors.Trace(err) | |||
} | |||
colLengthMap, err := getColLengthAllTables(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation doesn't do any RPC.
After this change, it may take too much time to execute if we have many tables.
And this may run frequently in some application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. You can take a look at getRowCountAllTable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
We can cache the result in case it is called too frequently, just like the way GlobalVariableCache
does.
infoschema/tables.go
Outdated
var TableStatsCacheExpiry = 3 * time.Second | ||
|
||
func (c *statsCache) get(ctx sessionctx.Context) (map[int64]uint64, map[tableHistID]uint64, error) { | ||
c.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The global lock may be a problem, GlobalVariableCache
doesn't hold the lock while doing internal SQL query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better. If the cache is valid, then we do not wait long. If the cache is not valid, there must be one and only one doing the query, and during the query, no one would get the expired cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a problem that we do unnecessary request.
We know this issue and didn't have a chance to further optimize this when we implement GlobalVariableCache
.
Since it doesn't matter the value is a little old, we can remove the global lock, set a flag indicates there is an ongoing request trying to update the cache, then other readers can just use the expired one.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/run-all-tests |
/run-common-test tidb-test=pr/625 |
What problem does this PR solve?
TiDB should be able to use sql to get the disk usage.
Fix #7468
What is changed and how it works?
Use the total column size that maintained by stats to fill in the data length fields, which includes
avg_row_length
,data_length
andindex_length
.Check List
Tests
Code changes
Side effects
Related changes
PTAL @coocood @zz-jason @winoros