Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

Merged
merged 1 commit into from
Feb 24, 2021

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Feb 22, 2021

This commit updates the query used to build the virtual table
crdb_internal.table_row_statistics so that it is always run at
AS OF SYSTEM TIME '-10s'. This will reduce contention on the
system.table_statistics table and avoid delays when running SHOW TABLES,
which queries crdb_internal.table_row_statistics.

Informs #58189

Release note (performance improvement): Updated the query used to build
the virtual table crdb_internal.table_row_statistics so that it is always
run at AS OF SYSTEM TIME '-10s'. This should reduce contention on
the table and improve performance for transactions that rely on
crdb_internal.table_row_statistics, such as SHOW TABLES.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft
Copy link
Collaborator Author

rytaft commented Feb 22, 2021

Note: this change deviates slightly from the suggestions by @ajwerner. I don't have very good intuition about transaction contention, so this change may be suboptimal. But the reason I did it this way is that I am conscious of the memory overhead required for histograms, and generating all of the histograms up front seems like it could expand the memory utilization more than necessary. But maybe I'm just being overly cautious.

I also don't think we need to add all the stats in the same transaction, which is why I split it up into several smaller transactions. But maybe you disagree, @RaduBerinde. Curious to hear your thoughts.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be even better than this would be to accumulate all of the writes into a single batch and commit that batch atomically. That would incur zero contention in most cases and a minimal amount in cases where the relevant stats rows spanned ranges. Is it ever the case that the sketches sum up to megabytes?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose maybe that's hard.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

@rytaft
Copy link
Collaborator Author

rytaft commented Feb 22, 2021

I think it's rare that histograms would be multiple megabytes, but if a table were very wide with a lot of columns and indexes, it could definitely happen.

When you say "accumulate all of the writes into a single batch", do you mean write a multi-row insert statement? If so, I can give that a try...

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel that sometimes it's bad the ways in which SQL is a bad match for the desired KV API semantics. Like if we didn't need to support the conversational semantics exactly, we'd have more freedom to group things into batches better.

Anyway, this certainly is an improvement. Better yet would be to group the delete and insert into a single statement so that we could read and then broadcast the write with a commit. As this is it'll have ~8 WAN global round trips per column in the worst case.

Delete reads from leaseholder (1)
Delete writes (1 to lease holder, replication pipelined)
Insert writes (1 to lease holder, replication pipelined)
Commit (1 to leaseholders, 1 to replicate, 1 to make explicitly committed, 1 replicate, 1 resolve)

Here the contention footprint is ~7 global RTTs.

I think the best we can hope to do with statement per column is:

Delete reads from leaseholder (1 to leaseholders)
Delete and insert writes get propagated with a commit (1 to leaseholders, 1 to replicate, 1 to make explicitly committed, 1 to resolve)

Here the contention footprint is ~5 global RTTs

However, if in the above case all of the writes and deletes fall in the same range (very likely), then it's 1PC and then the reads are really just blocked waiting on replication so 1 global RTT.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the internals of how a CTE that does the delete and insert in the same statement would get planned. I doubt it will be a single batch. If it were, that would be amazing.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all of this data should be co-located, I think the min user priority thing should work pretty well. Transactions encountering the intents should be able to abort the writing transaction without leaving the local node.

On the whole, I'm coming around to this being just totally fine and a good change so long as you can tolerate the loss of atomicity.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have reason to believe that splitting up the txn is preferable?

One thing to keep in mind is that we'll need to change the gossip mechanism to a range feed soon. The range feed "listens" to changes on the system table. At that point, having a single txn would be better, otherwise we would end up refreshing the stats multiple times for the same table.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gossip thing seems like a big deal. The issue here is about reducing the contention footprint of these transactions. If the table has many columns and we're in a global cluster and we set priority low, then a loop of querying the stats from the leaseholders region will effectively prevent the stats from ever committing. Breaking the transaction into pieces alone at least bounds much more tightly how long that footprint will last. It still may be multiple global RTTs, as noted above.

I do have livelock worries with PRIORITY LOW. Everything is tradeoffs and in this one, I'm not sure what is best.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@rytaft
Copy link
Collaborator Author

rytaft commented Feb 22, 2021

Doesn't the priority increment each time a transaction is pushed?

@ajwerner
Copy link
Contributor

Doesn't the priority increment each time a transaction is pushed?

I'm pretty sure it never changes class. That incrementing priority is about avoiding starvation and providing some amount of fairness among transactions of the same priority class.

@rytaft rytaft changed the title rowexec: reduce contention caused by stats writer transactions sql: use AOST -10s when building crdb_internal.table_row_statistics Feb 23, 2021
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: take or leave the suggestion

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/crdb_internal.go, line 415 at r1 (raw file):

// querying the system.table_statistics table when building
// crdb_internal.table_row_statistics. It is mutable for testing.
var StatsAsOfTime = 10 * time.Second

a non-public cluster setting will both make the testing injection more elegant and generally just seems better to me.


pkg/sql/crdb_internal.go, line 418 at r1 (raw file):

var crdbInternalTablesTableLastStats = virtualSchemaTable{
	comment: "the latest stats for all tables accessible by current user in current database (KV scan)",

nit: I do think t's worth a comment here too that it is stale

@rytaft rytaft force-pushed the stats-show-tables branch 2 times, most recently from 9332695 to 6d0bec6 Compare February 23, 2021 23:30
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is RFAL - thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)


pkg/sql/crdb_internal.go, line 415 at r1 (raw file):

Previously, ajwerner wrote…

a non-public cluster setting will both make the testing injection more elegant and generally just seems better to me.

Done.


pkg/sql/crdb_internal.go, line 418 at r1 (raw file):

Previously, ajwerner wrote…

nit: I do think t's worth a comment here too that it is stale

Done.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @rytaft)


pkg/sql/catalog/descs/collection.go, line 325 at r2 (raw file):

Quoted 15 lines of code…
	if descID == descpb.InvalidID {
		var found bool
		var err error
		found, descID, err = catalogkv.LookupObjectID(ctx, txn, codec, parentID, parentSchemaID, name)
		if err != nil || !found {
			return found, nil, err
		}
	}
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil {
		// Having done the namespace lookup, the descriptor must exist.
		return false, nil, errors.NewAssertionErrorWithWrappedErrf(

gosh this is gross :(

I guess we just pushed this grossness down here. Can you do something like:

isSystemDescriptor := descID != descpb.InvalidID
if !isSystemDescriptor {
    var found bool
    var err error
    // ....
}
//.. 
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil && isSystemDescriptor {
               // Say things about how this can happen during startup because we're not actually looking up the system descriptor IDs in KV.
              // return something wrapping ErrDescriptorNotFound
       } else if desc == nil {
        	// Having done the namespace lookup, the descriptor must exist.
                // Assertion failure

This commit updates the query used to build the virtual table
crdb_internal.table_row_statistics so that it is always run at AS OF
SYSTEM TIME '-10s'. This will reduce contention on the
system.table_statistics table and avoid delays when running SHOW TABLES,
which queries crdb_internal.table_row_statistics.

Informs cockroachdb#58189

Release note (performance improvement): Updated the query used to build
the virtual table crdb_internal.table_row_statistics so that it is always
run at AS OF SYSTEM TIME '-10s'. This should reduce contention on
the table and improve performance for transactions that rely on
crdb_internal.table_row_statistics, such as SHOW TABLES.
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)


pkg/sql/catalog/descs/collection.go, line 325 at r2 (raw file):

Previously, ajwerner wrote…
	if descID == descpb.InvalidID {
		var found bool
		var err error
		found, descID, err = catalogkv.LookupObjectID(ctx, txn, codec, parentID, parentSchemaID, name)
		if err != nil || !found {
			return found, nil, err
		}
	}
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil {
		// Having done the namespace lookup, the descriptor must exist.
		return false, nil, errors.NewAssertionErrorWithWrappedErrf(

gosh this is gross :(

I guess we just pushed this grossness down here. Can you do something like:

isSystemDescriptor := descID != descpb.InvalidID
if !isSystemDescriptor {
    var found bool
    var err error
    // ....
}
//.. 
	// Always pick up a mutable copy so it can be cached.
	desc, err = catalogkv.GetAnyDescriptorByID(ctx, txn, codec, descID, catalogkv.Mutable)
	if err != nil {
		return false, nil, err
	} else if desc == nil && isSystemDescriptor {
               // Say things about how this can happen during startup because we're not actually looking up the system descriptor IDs in KV.
              // return something wrapping ErrDescriptorNotFound
       } else if desc == nil {
        	// Having done the namespace lookup, the descriptor must exist.
                // Assertion failure

Done. Thanks for the code snippet!

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 6 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@rytaft
Copy link
Collaborator Author

rytaft commented Feb 24, 2021

TFTR!

bors r+

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks!

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

@craig
Copy link
Contributor

craig bot commented Feb 24, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 24, 2021

Build succeeded:

@craig craig bot merged commit b887109 into cockroachdb:master Feb 24, 2021
craig bot pushed a commit that referenced this pull request Feb 27, 2021
61215: sql: deflake TestPGPreparedQuery r=RaduBerinde,rytaft a=rafiss

A recent change to stats in SHOW TABLES made this test flaky. Stats are
now read with AOST -10s by default, but for tests like this we need to
modify this. See #60953 and #61191

Release justification: test-only change

Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants