sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

rytaft · 2021-02-22T22:00:31Z

This commit updates the query used to build the virtual table
crdb_internal.table_row_statistics so that it is always run at
AS OF SYSTEM TIME '-10s'. This will reduce contention on the
system.table_statistics table and avoid delays when running SHOW TABLES,
which queries crdb_internal.table_row_statistics.

Informs #58189

Release note (performance improvement): Updated the query used to build
the virtual table crdb_internal.table_row_statistics so that it is always
run at AS OF SYSTEM TIME '-10s'. This should reduce contention on
the table and improve performance for transactions that rely on
crdb_internal.table_row_statistics, such as SHOW TABLES.

cockroach-teamcity · 2021-02-22T22:00:39Z

This change is

rytaft · 2021-02-22T22:06:48Z

Note: this change deviates slightly from the suggestions by @ajwerner. I don't have very good intuition about transaction contention, so this change may be suboptimal. But the reason I did it this way is that I am conscious of the memory overhead required for histograms, and generating all of the histograms up front seems like it could expand the memory utilization more than necessary. But maybe I'm just being overly cautious.

I also don't think we need to add all the stats in the same transaction, which is why I split it up into several smaller transactions. But maybe you disagree, @RaduBerinde. Curious to hear your thoughts.

ajwerner

What would be even better than this would be to accumulate all of the writes into a single batch and commit that batch atomically. That would incur zero contention in most cases and a minimal amount in cases where the relevant stats rows spanned ranges. Is it ever the case that the sketches sum up to megabytes?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

ajwerner

I suppose maybe that's hard.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

rytaft · 2021-02-22T22:30:04Z

I think it's rare that histograms would be multiple megabytes, but if a table were very wide with a lot of columns and indexes, it could definitely happen.

When you say "accumulate all of the writes into a single batch", do you mean write a multi-row insert statement? If so, I can give that a try...

ajwerner

I do feel that sometimes it's bad the ways in which SQL is a bad match for the desired KV API semantics. Like if we didn't need to support the conversational semantics exactly, we'd have more freedom to group things into batches better.

Anyway, this certainly is an improvement. Better yet would be to group the delete and insert into a single statement so that we could read and then broadcast the write with a commit. As this is it'll have ~8 WAN global round trips per column in the worst case.

Delete reads from leaseholder (1)
Delete writes (1 to lease holder, replication pipelined)
Insert writes (1 to lease holder, replication pipelined)
Commit (1 to leaseholders, 1 to replicate, 1 to make explicitly committed, 1 replicate, 1 resolve)

Here the contention footprint is ~7 global RTTs.

I think the best we can hope to do with statement per column is:

Delete reads from leaseholder (1 to leaseholders)
Delete and insert writes get propagated with a commit (1 to leaseholders, 1 to replicate, 1 to make explicitly committed, 1 to resolve)

Here the contention footprint is ~5 global RTTs

However, if in the above case all of the writes and deletes fall in the same range (very likely), then it's 1PC and then the reads are really just blocked waiting on replication so 1 global RTT.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

ajwerner

I don't know the internals of how a CTE that does the delete and insert in the same statement would get planned. I doubt it will be a single batch. If it were, that would be amazing.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

ajwerner

Given all of this data should be co-located, I think the min user priority thing should work pretty well. Transactions encountering the intents should be able to abort the writing transaction without leaving the local node.

On the whole, I'm coming around to this being just totally fine and a good change so long as you can tolerate the loss of atomicity.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

RaduBerinde

Do we have reason to believe that splitting up the txn is preferable?

One thing to keep in mind is that we'll need to change the gossip mechanism to a range feed soon. The range feed "listens" to changes on the system table. At that point, having a single txn would be better, otherwise we would end up refreshing the stats multiple times for the same table.

Reviewable status: complete! 0 of 0 LGTMs obtained

ajwerner

That gossip thing seems like a big deal. The issue here is about reducing the contention footprint of these transactions. If the table has many columns and we're in a global cluster and we set priority low, then a loop of querying the stats from the leaseholders region will effectively prevent the stats from ever committing. Breaking the transaction into pieces alone at least bounds much more tightly how long that footprint will last. It still may be multiple global RTTs, as noted above.

I do have livelock worries with PRIORITY LOW. Everything is tradeoffs and in this one, I'm not sure what is best.

Reviewable status: complete! 0 of 0 LGTMs obtained

rytaft · 2021-02-22T22:52:21Z

Doesn't the priority increment each time a transaction is pushed?

ajwerner · 2021-02-22T22:55:53Z

Doesn't the priority increment each time a transaction is pushed?

I'm pretty sure it never changes class. That incrementing priority is about avoiding starvation and providing some amount of fairness among transactions of the same priority class.

ajwerner

take or leave the suggestion

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/crdb_internal.go, line 415 at r1 (raw file):

// querying the system.table_statistics table when building
// crdb_internal.table_row_statistics. It is mutable for testing.
var StatsAsOfTime = 10 * time.Second

a non-public cluster setting will both make the testing injection more elegant and generally just seems better to me.

pkg/sql/crdb_internal.go, line 418 at r1 (raw file):

var crdbInternalTablesTableLastStats = virtualSchemaTable{
	comment: "the latest stats for all tables accessible by current user in current database (KV scan)",

nit: I do think t's worth a comment here too that it is stale

rytaft

I think this is RFAL - thanks!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)

pkg/sql/crdb_internal.go, line 415 at r1 (raw file):

Previously, ajwerner wrote…

a non-public cluster setting will both make the testing injection more elegant and generally just seems better to me.

Done.

pkg/sql/crdb_internal.go, line 418 at r1 (raw file):

Previously, ajwerner wrote…

nit: I do think t's worth a comment here too that it is stale

Done.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @rytaft)

pkg/sql/catalog/descs/collection.go, line 325 at r2 (raw file):

Quoted 15 lines of code…

	if descID == descpb.InvalidID {
		var found bool
		var err error
		found, descID, err = catalogkv.LookupObjectID(ctx, txn, codec, parentID, parentSchemaID, name)
		if err != nil || !found {
			return found, nil, err
		}
	}
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil {
		// Having done the namespace lookup, the descriptor must exist.
		return false, nil, errors.NewAssertionErrorWithWrappedErrf(

gosh this is gross :(

I guess we just pushed this grossness down here. Can you do something like:

isSystemDescriptor := descID != descpb.InvalidID
if !isSystemDescriptor {
    var found bool
    var err error
    // ....
}
//.. 
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil && isSystemDescriptor {
               // Say things about how this can happen during startup because we're not actually looking up the system descriptor IDs in KV.
              // return something wrapping ErrDescriptorNotFound
       } else if desc == nil {
        	// Having done the namespace lookup, the descriptor must exist.
                // Assertion failure

This commit updates the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This will reduce contention on the system.table_statistics table and avoid delays when running SHOW TABLES, which queries crdb_internal.table_row_statistics. Informs cockroachdb#58189 Release note (performance improvement): Updated the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This should reduce contention on the table and improve performance for transactions that rely on crdb_internal.table_row_statistics, such as SHOW TABLES.

rytaft

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)

pkg/sql/catalog/descs/collection.go, line 325 at r2 (raw file):

Previously, ajwerner wrote…

	if descID == descpb.InvalidID {
		var found bool
		var err error
		found, descID, err = catalogkv.LookupObjectID(ctx, txn, codec, parentID, parentSchemaID, name)
		if err != nil || !found {
			return found, nil, err
		}
	}
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil {
		// Having done the namespace lookup, the descriptor must exist.
		return false, nil, errors.NewAssertionErrorWithWrappedErrf(

gosh this is gross :(

I guess we just pushed this grossness down here. Can you do something like:

isSystemDescriptor := descID != descpb.InvalidID
if !isSystemDescriptor {
    var found bool
    var err error
    // ....
}
//.. 
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil && isSystemDescriptor {
               // Say things about how this can happen during startup because we're not actually looking up the system descriptor IDs in KV.
              // return something wrapping ErrDescriptorNotFound
       } else if desc == nil {
        	// Having done the namespace lookup, the descriptor must exist.
                // Assertion failure

Done. Thanks for the code snippet!

ajwerner

Reviewed 5 of 6 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained

rytaft · 2021-02-24T17:42:37Z

TFTR!

bors r+

RaduBerinde

Thanks!

Reviewable status: complete! 2 of 0 LGTMs obtained

craig · 2021-02-24T20:01:34Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2021-02-24T22:19:16Z

Build succeeded:

GitHub CI (Cockroach)

61215: sql: deflake TestPGPreparedQuery r=RaduBerinde,rytaft a=rafiss A recent change to stats in SHOW TABLES made this test flaky. Stats are now read with AOST -10s by default, but for tests like this we need to modify this. See #60953 and #61191 Release justification: test-only change Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

rytaft requested review from ajwerner and RaduBerinde February 22, 2021 22:00

ajwerner reviewed Feb 22, 2021

View reviewed changes

RaduBerinde reviewed Feb 22, 2021

View reviewed changes

ajwerner reviewed Feb 22, 2021

View reviewed changes

rytaft force-pushed the stats-show-tables branch from 5f9c855 to 74b5c84 Compare February 23, 2021 01:23

rytaft changed the title ~~rowexec: reduce contention caused by stats writer transactions~~ sql: use AOST -10s when building crdb_internal.table_row_statistics Feb 23, 2021

ajwerner mentioned this pull request Feb 23, 2021

*: internal executor paradigm is bad for global latency (parallel txn) #60968

Closed

ajwerner approved these changes Feb 23, 2021

View reviewed changes

rytaft force-pushed the stats-show-tables branch 2 times, most recently from 9332695 to 6d0bec6 Compare February 23, 2021 23:30

rytaft commented Feb 23, 2021

View reviewed changes

rytaft force-pushed the stats-show-tables branch from 6d0bec6 to 6cfcecc Compare February 24, 2021 01:52

ajwerner reviewed Feb 24, 2021

View reviewed changes

rytaft force-pushed the stats-show-tables branch from 6cfcecc to 9c24267 Compare February 24, 2021 16:22

rytaft commented Feb 24, 2021

View reviewed changes

ajwerner approved these changes Feb 24, 2021

View reviewed changes

RaduBerinde approved these changes Feb 24, 2021

View reviewed changes

craig bot merged commit b887109 into cockroachdb:master Feb 24, 2021

rytaft mentioned this pull request Feb 24, 2021

stats: contention on stats table causes slow SHOW TABLES command #58189

Closed

rafiss mentioned this pull request Feb 26, 2021

sql: deflake TestPGPreparedQuery #61215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

rytaft commented Feb 22, 2021 •

edited

Loading

cockroach-teamcity commented Feb 22, 2021

rytaft commented Feb 22, 2021

ajwerner left a comment

ajwerner left a comment

rytaft commented Feb 22, 2021

ajwerner left a comment

ajwerner left a comment

ajwerner left a comment

RaduBerinde left a comment

ajwerner left a comment

rytaft commented Feb 22, 2021

ajwerner commented Feb 22, 2021

ajwerner left a comment

rytaft left a comment

ajwerner left a comment

rytaft left a comment

ajwerner left a comment

rytaft commented Feb 24, 2021

RaduBerinde left a comment

craig bot commented Feb 24, 2021

craig bot commented Feb 24, 2021

sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

Conversation

rytaft commented Feb 22, 2021 • edited Loading

cockroach-teamcity commented Feb 22, 2021

rytaft commented Feb 22, 2021

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

rytaft commented Feb 22, 2021

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

rytaft commented Feb 22, 2021

ajwerner commented Feb 22, 2021

ajwerner left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

rytaft commented Feb 24, 2021

RaduBerinde left a comment

Choose a reason for hiding this comment

craig bot commented Feb 24, 2021

craig bot commented Feb 24, 2021

rytaft commented Feb 22, 2021 •

edited

Loading