stats: contention on stats table causes slow SHOW TABLES command #58189

rytaft · 2020-12-22T17:32:17Z

We have heard reports that contention on the stats table is causing the SHOW TABLES command to execute very slowly. In 20.2 we augmented the SHOW TABLES command to include information collected from stats collection. This turns out to be a performance regression, especially in multi-region clusters as fetching the stats requires a lookup. This alone isn't always so bad as each lookup should be quick, however, sometimes the lookup encounters contention with a job writing new stats. The process of writing new stats happens incrementally and in a long-running cluster can take seconds.

Users of CRDB who are running into this issue today on 20.2.x can use the following workarounds:

Run your SHOW TABLES at PRIORITY HIGH. This will require using an explicit transaction. For example: BEGIN PRIORITY HIGH; SHOW TABLES; COMMIT;.
Reduce the frequency of stats refreshes to make contention less common. As described here, you can do this by tuning the cluster settings sql.stats.automatic_collection.fraction_stale_rows and sql.stats.automatic_collection.min_stale_rows. There is a potential downside with this approach, though, since it could mean that the stats are more stale and therefore the optimizer might choose a worse plan.

For future releases and/or backports, we should try to reduce contention on the stats table so the above workarounds aren't necessary.
As suggested by @ajwerner, we can reduce contention when stats are written as follows:

Run the txn at low priority, it's bad for creating stats to block running transactions or show tables
Generate the encoded data above the txn
Batch the data insertion as it's probably all going to the same range anyway

Another suggestion is to support WITHOUT STATISTICS as a modifier to SHOW TABLES, which will make the query much faster regardless of contention.

Finally, we might consider using the stats cache for SHOW TABLES. We're currently querying the system table directly, but we could probably avoid some lookups if we use the cache. One possible issue with this approach is that if the cache is full, it will cause entries to be evicted by table stats that may not be useful for other queries. We could write a new function StealthyGetTableStats, which would in turn use the function StealthyGet that @RaduBerinde recently added to avoid counting a Get towards cache eviction.

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2020-12-22T17:35:53Z

Could SHOW TABLES query that table using AS OF SYSTEM TIME -10s?

rytaft · 2020-12-22T17:37:26Z

Could SHOW TABLES query that table using AS OF SYSTEM TIME -10s?

That's a good idea -- that would be a simple backport, along with support for WITHOUT STATISTICS.

ajwerner · 2020-12-22T17:39:45Z

Could SHOW TABLES query that table using AS OF SYSTEM TIME -10s?

That's a good idea -- that would be a simple backport, along with support for WITHOUT STATISTICS.

We don't currently support running different portions of queries at different timestamps. SHOW TABLES is transactional and I'd be very afraid to break that property.

RaduBerinde · 2020-12-22T18:03:42Z

Ah, it's all a big query..
https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/delegate/show_tables.go#L55

ajwerner · 2020-12-23T20:14:12Z

We may want to punt part of this to @cockroachdb/sql-experience to strip the stats off of SHOW TABLES optionally and to litigate what the default should be. Putting it on their board too.

rafiss · 2020-12-23T21:36:36Z

i would be in favor of vanilla SHOW TABLES being stripped down to only include schema_name, table_name, type, owner. (postgres doesn't have a SHOW command, but this matches the output of the \d metacommand from psql).

ajwerner · 2020-12-23T22:59:55Z

i would be in favor of vanilla SHOW TABLES being stripped down to only include schema_name, table_name, type, owner. (postgres doesn't have a SHOW command, but this matches the output of the \d metacommand from psql).

I absolutely support that for the next major version but I don't think we can backport that.

SHOW TABLES requires a lookup of table stats. This lookup can experience contention with the job that writes new stats. In multi-region clusters, the stats-writing job can take seconds, so this is likely. In 21.1, we likely will remove the stats from the SHOW TABLES output by default, and introduce a WITH STATISTICS modifier. The smaller change here is meant for backporting to 20.2 touches cockroachdb#58189 Release note (performance improvement): The SHOW TABLES command could experience undesirable latency due to contention when looking up table statistics. Now the contention is reduced.

rafiss · 2021-01-26T19:54:15Z

For backport purposes, we'll probably address this by adding a cluster setting that disables showing the row count in SHOW TABLES output.

ajwerner · 2021-01-26T19:57:55Z

This suggestion feels totally backportable and wonderful. It may not solve all use cases where users have code that calls vanilla SHOW TABLES and doesn't specify the columns but then again, changing the cluster setting value also may break an application (depending on the driver etc).

otan · 2021-02-03T19:25:45Z

So for 20.2, we want to add a cluster setting that people can set to OFF to avoid displaying this.
What do we want as the behaviour for 21.1?

Same cluster setting? Default to on or off? If different default to 20.2, people migrating over may feel pain.
add syntax WITH STATISTICS? There's already WITH COMMENT, what if people want both?

cc @vy-ton

vy-ton · 2021-02-03T19:47:28Z

i would be in favor of vanilla SHOW TABLES being stripped down to only include schema_name, table_name, type, owner. (postgres doesn't have a SHOW command, but this matches the output of the \d metacommand from psql).

This would be a backwards-incomptabile change that I'm not sure is worth doing. For 20.2 and beyond, a user can set the cluster setting to OFF if they want to avoid displaying row count. For future SHOW columns, we can consider the WITH syntax and whether the default behavior should be ON vs OFF.

cc @awoods187 for awareness

otan · 2021-02-03T19:49:12Z

What would the WITH syntax look like if you want stats and comments? or maybe not allow both at the same time?

SHOW TABLES WITH STATISTICS # no comment?
SHOW TABLES WITH COMMENT, STATISTICS # this?
SHOW TABLES WITH COMMENT WITH STATISTICS # or this?

vy-ton · 2021-02-03T19:59:24Z

One guiding principle could be to only support the WITH syntax for columns with a performance cost. So we could add comment to the output for SHOW TABLES.

Unless we want to make SHOW TABLES declarative and specify every column, I treat SHOW TABLES = SELECT *. You can fetch specify columns by using SELECT _ FROM [SHOW...]

There's probably more thinking we can do around SHOW commands as part of our public API

otan · 2021-02-03T20:01:00Z

I should be clearer, we already have SHOW TABLES WITH COMMENT. The question is how COMMENT and STATISTICS should look if we are using WITH

rytaft · 2021-02-05T20:38:05Z

Hi @rafiss, thanks for the reminder. I've had it on my todo list for a while, but things have been a bit crazy...

I agree that these performance improvements would likely help, but I'm not sure they would get rid of the performance problems altogether, since stats updates write a lot of data. But I also don't fully understand how transaction priorities work or how batching changes performance. Maybe @ajwerner can give more insight here. I also don't fully understand the risks of backporting these changes, especially the prioritization change. Could running at low priority cause starvation issues for the process trying to write statistics to the DB?

Either way, I'll try to make these changes to master over the course of the next week.

otan · 2021-02-10T02:33:36Z

Since we have optimisations underway, I've taken sql-experience off this after adding the cluster setting in 21.1: #60282

I think if I read this correctly there's some optimiser changes that can happen. If we decide we still want to go down the WITH STATISTICS route, feel free to reassign over to us!

59330: sql: make internal executor streaming r=yuzefovich a=yuzefovich This commit updates the internal executor to operate in a streaming fashion by refactoring its internal logic to implement an iterator pattern. A new method `QueryInternalEx` (and its counterpart `QueryInternal`) is introduced (both not used currently) while all existing methods of `InternalExecutor` interface are implemented using the new iterator logic. The communication between the iterator goroutine (the receiver) and the connExecutor goroutine (the sender) is done via a buffered (of 32 size in non-test setting) channel. The channel is closed when the connExecutor goroutine exits its run() loop. Care needs to be taken when closing the iterator - we need to make sure to close the stmtBuf (so that there are no more commands for the connExecutor goroutine to execute) and then we need to unblockingly drain the channel (since the connExecutor goroutine might be blocked on adding a row to the channel). After that we have to wait for the connExecutor goroutine to exit so that we can finish the tracing span. For convenience purposes, if the iterator is fully exhausted, it will get closed automatically. Addresses: #48595. Release note: None 60023: kv: wait for replica initialization in closed timestamp tests r=nvanbenschoten a=nvanbenschoten Fixes #60288. This commit updates tests in closed_timestamp_test.go to wait for replica initialization before returning replicas in `replsForRange`. This was causing flakiness on master. I bisected it back to c44b357. It appears that that change allowed for calls directly to `Replica.Send` (testing only) to make it deeper into replica code and eventually hit a panic in `Replica.checkSpanInRangeRLocked`. This commit fixes this in two ways: 1. it updates the closed timestamp tests to wait for replica initialization before calling directly into `Replica.Send`. Tests shouldn't be calling `Replica.Send` on uninitialized replicas. 2. it adds extra protection in `Replica.checkExecutionCanProceed` to throw an error if the replica is no initialized. This isn't strictly necessary, but it's cheap and limits the blast radius of getting this wrong elsewhere. Release note: None 60282: delegate: gate estimated_row_count on SHOW TABLES behind cluster setting r=rafiss a=otan Refs: #58189 Release note (sql change): Introduced a cluster setting sql.show_tables.estimated_row_count.enabled, which defaults to true. If false, estimated_row_count will not display on SHOW TABLES which improves performance. 60285: geo/wkt: add support for parsing multilinestrings with Z and M dims r=otan a=andyyang890 This patch extends the capabilities of the WKT parser to include parsing of multilinestrings with Z and M dimensions. Refs: #53091 Release note: None 60305: sql: use kv_node_status to determine node liveness for multiregion r=arulajmani a=otan This was a relic before the decision was made -- node liveness was detected by gossip instead. This changes the code to use SHOW REGIONS FROM CLUSTER, which aggregates from kv_node_status. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Andy Yang <ayang@cockroachlabs.com>

rafiss · 2021-02-11T17:17:23Z

@rytaft: I just noticed your comment:

I also don't fully understand the risks of backporting these changes, especially the prioritization change.

We don't need to backport the changes you make to stats writing -- I was just wondering if the improvements would be in 21.1. The new cluster setting that @otan added is good enough for the backport.

awoods187 · 2021-02-11T17:42:01Z

I am still hopeful that we can keep stats in SHOW TABLES in 21.1 as I think they are useful

rytaft · 2021-02-11T18:45:09Z

We don't need to backport the changes you make to stats writing

Great - thanks! I think I should be able to squeeze this stuff in to 21.1

RaduBerinde · 2021-02-22T22:54:08Z

Going back to the idea of AS OF SYSTEM TIME, what would be the problem with issuing the internal query for table_row_statistics in a separate transaction, with AS OF SYSTEM TIME -10s ? There is no requirement that the virtual table reflects the statistics as available at the outer txn timestamp.

This commit updates the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This will reduce contention on the system.table_statistics table and avoid delays when running SHOW TABLES, which queries crdb_internal.table_row_statistics. Informs cockroachdb#58189 Release note (performance improvement): Updated the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This should reduce contention on the table and improve performance for transactions that rely on crdb_internal.table_row_statistics, such as SHOW TABLES.

60707: cli,storage/cloud: add CLI flag to disable outbound network storage access r=dt a=dt Multi-tenant clusters will initially restrict outbound network access. In addition to the firewall rules at deployment that will actually block the network traffic itself, this CLI flag can be used to indeed prevent even trying make such network requests and return a clearer error instead. Release note: none. Release justification: Low risk (opt-in flag), high reward (better errors for serverless users) changes to existing functionality. 60953: sql: use AOST -10s when building crdb_internal.table_row_statistics r=rytaft a=rytaft This commit updates the query used to build the virtual table `crdb_internal.table_row_statistics` so that it is always run at `AS OF SYSTEM TIME '-10s'`. This will reduce contention on the `system.table_statistics` table and avoid delays when running `SHOW TABLES`, which queries `crdb_internal.table_row_statistics`. Informs #58189 Release note (performance improvement): Updated the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This should reduce contention on the table and improve performance for transactions that rely on crdb_internal.table_row_statistics, such as SHOW TABLES. Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

rytaft · 2021-02-24T22:27:04Z

I just merged #60953 which implements @RaduBerinde's idea from the previous comment. Based on offline discussions, we decided that this idea was preferable to some of the other ideas discussed here such as running the stats writing transaction at low priority (due to the possibility of starvation) and batching of the stats writer SQL statements (it's very difficult and not ergonomic to batch the DELETE and INSERT SQL statements of the stats writer into a single statement). The stats cache idea was also rejected due to the fact that the cache may not have stats for all tables.

This commit updates the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This will reduce contention on the system.table_statistics table and avoid delays when running SHOW TABLES, which queries crdb_internal.table_row_statistics. Informs cockroachdb#58189 Release note (performance improvement): Updated the query used to build the virtual table crdb_internal.table_row_statistics so that it is always run at AS OF SYSTEM TIME '-10s'. This should reduce contention on the table and improve performance for transactions that rely on crdb_internal.table_row_statistics, such as SHOW TABLES.

rytaft · 2021-03-03T23:21:23Z

Can we close this now? Or are we still waiting on #59838 (cc @ajwerner)?

rytaft · 2021-03-16T19:01:44Z

I'm going to go ahead and close this, since I consider it fixed with #60953. Feel free to reopen if you disagree.

rytaft added C-performance Perf of queries or internals. Solution not expected to change functional behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. A-sql-table-stats Table statistics (and their automatic refresh). labels Dec 22, 2020

ajwerner mentioned this issue Dec 23, 2020

sql: add timestamp to estimated row counts in SHOW TABLES #58175

Open

rafiss mentioned this issue Jan 25, 2021

sql: use AOST for SHOW TABLES query #59366

Closed

rafiss assigned otan Jan 26, 2021

otan mentioned this issue Feb 9, 2021

delegate: gate estimated_row_count on SHOW TABLES behind cluster setting #60282

Merged

otan assigned rytaft and unassigned otan Feb 10, 2021

rytaft mentioned this issue Feb 22, 2021

sql: use AOST -10s when building crdb_internal.table_row_statistics #60953

Merged

rytaft closed this as completed Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats: contention on stats table causes slow SHOW TABLES command #58189

stats: contention on stats table causes slow SHOW TABLES command #58189

rytaft commented Dec 22, 2020

RaduBerinde commented Dec 22, 2020

rytaft commented Dec 22, 2020

ajwerner commented Dec 22, 2020

RaduBerinde commented Dec 22, 2020

ajwerner commented Dec 23, 2020

rafiss commented Dec 23, 2020

ajwerner commented Dec 23, 2020

rafiss commented Jan 26, 2021

ajwerner commented Jan 26, 2021

otan commented Feb 3, 2021

vy-ton commented Feb 3, 2021 •

edited

Loading

otan commented Feb 3, 2021 •

edited

Loading

vy-ton commented Feb 3, 2021

otan commented Feb 3, 2021

rytaft commented Feb 5, 2021

otan commented Feb 10, 2021

rafiss commented Feb 11, 2021

awoods187 commented Feb 11, 2021

rytaft commented Feb 11, 2021

RaduBerinde commented Feb 22, 2021

rytaft commented Feb 24, 2021

rytaft commented Mar 3, 2021

rytaft commented Mar 16, 2021

stats: contention on stats table causes slow SHOW TABLES command #58189

stats: contention on stats table causes slow SHOW TABLES command #58189

Comments

rytaft commented Dec 22, 2020

RaduBerinde commented Dec 22, 2020

rytaft commented Dec 22, 2020

ajwerner commented Dec 22, 2020

RaduBerinde commented Dec 22, 2020

ajwerner commented Dec 23, 2020

rafiss commented Dec 23, 2020

ajwerner commented Dec 23, 2020

rafiss commented Jan 26, 2021

ajwerner commented Jan 26, 2021

otan commented Feb 3, 2021

vy-ton commented Feb 3, 2021 • edited Loading

otan commented Feb 3, 2021 • edited Loading

vy-ton commented Feb 3, 2021

otan commented Feb 3, 2021

rytaft commented Feb 5, 2021

otan commented Feb 10, 2021

rafiss commented Feb 11, 2021

awoods187 commented Feb 11, 2021

rytaft commented Feb 11, 2021

RaduBerinde commented Feb 22, 2021

rytaft commented Feb 24, 2021

rytaft commented Mar 3, 2021

rytaft commented Mar 16, 2021

vy-ton commented Feb 3, 2021 •

edited

Loading

otan commented Feb 3, 2021 •

edited

Loading