-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: investigate performance regressions on some queries #47058
Comments
cc @RaduBerinde |
47365: sql: ignore soft limits on scan nodes for distsql planning r=yuzefovich a=yuzefovich We have added propagation of soft limits in 20.1 release, and this causes some of the queries that used to run via DistSQL with `distsql=auto` to get a "should not distribute" recommendation during distsql physical planning. However, this can cause an egregious performance regression on some queries from 19.2 version. In order to keep the decision whether to distribute scans or not the same, we will be ignoring the soft limits on scan nodes for now. Addresses: #47058. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Wrote a script to check for plan differences in different versions. Here is the output for these queries. In each query, the top plan is 19.1.8 and the bottom is 20.1.0: Some differences are due to the output changing between versions. I haven't taken a close look at them yet but wanted to upload them in case anyone saw anything interesting. |
And here is the explain analyze output: |
Looking at the explain analyze output of some queries that have window functions it seems that a lot of these regressions can be explained by the fact that we previously did not respect the work limit in the windower processor. We seem to be properly spilling to disk in 20.1 so I would classify these as expected regressions. Queries that include window functions here are: 1, 10, 11, 14, 18, 19, 24 so we can eliminate those. One thing for the optimizer team cc @RaduBerinde: Query 5 seems to use a merge join instead of a lookup join. Still needs further investigation: 0, 3, 6, 9, 12, 16, and 22. The |
thanks for taking a look at these @asubiotto! I'm curious about query 5 @rytaft or @RaduBerinde |
I've looked into this, and I'm pretty sure that query 5 changed because we've made lookup joins more expensive since 19.1. In particular, #40248 and #43003 probably had an impact. I haven't actually done a git bisect, but I'm pretty sure that's the cause. Part of the problem here is that our row count estimate of the distinct table after filtering is incorrect since we can't determine that the filter on |
This commit adds a new rule, EliminateExistsZeroRows, which converts an Exists subquery to False when it's known that the input produces zero rows. Informs cockroachdb#47058 Release note (performance improvement): The optimizer can now detect when an Exists subquery can be eliminated because the input has zero rows. This leads to better plans in some cases.
Actually -- just went ahead and submitted a PR since it was so trivial: #48162. I confirmed that the resulting plan for query 5 is much better. |
46992: sql: Add Logical Column ID field to ColumnDescriptor r=rohany a=RichardJCai The LogicalColumnID field mimics the ColumnID field however LogicalColumnID may be swapped between two columns whereas ColumnID cannot. LogicalColumnID is referenced for virtual tables (pg_catalog, information_schema) and most notably affects column ordering for SHOW COLUMNS. This LogicalColumnID field support swapping the order of two columns - currently only used for ALTER COLUMN TYPE when a shadow column is created and swapped with it's original column. Does not affect existing behaviour. Release note: None 47449: cli: add --cert-principal-map to client commands r=petermattis a=petermattis Add support for the `--cert-principal-map` flag to the certs and client commands. Anywhere we were accepting the `--certs-dir` flag, we now also accept the `--cert-principal-map` flag. Fixes #47300 Release note (cli change): Support the `--cert-principal-map` flag in the `cert *` and "client" commands such as `sql`. 48138: keys: support splitting Ranges on tenant-id prefixed keys r=nvanbenschoten a=nvanbenschoten Fixes #48122. Relates to #47903. Relates to #48123. This PR contains a series of small commits that work towards the introduction of tenant-id prefixed keyspaces and begin the removal of some `keys.TODOSQLCodec` instances. This should be the only time we need to touch C++ throughout this work. 48160: storage,libroach: Check for MaxKeys when reading from intent history r=itsbilal a=itsbilal We weren't checking for MaxKeys (or TargetBytes) being reached in the case where we read from intent history in the MVCC scanner. All other cases go through addAndAdvance(), which had these checks. Almost certainly fixes #46652. Would be very surprised if it was something else. Release note (bug fix): Fixes a bug where a read operation in a transaction would error out for exceeding the maximum count of results returned. 48162: opt: add rule to eliminate Exists when input has zero rows r=rytaft a=rytaft This commit adds a new rule, `EliminateExistsZeroRows`, which converts an `Exists` subquery to False when it's known that the input produces zero rows. Informs #47058 Release note (performance improvement): The optimizer can now detect when an Exists subquery can be eliminated because the input has zero rows. This leads to better plans in some cases. Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Peter Mattis <petermattis@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Our friends from Apollo team at Georgia Tech have found multiple queries on which our 19.1 version performs better than 19.2 which indicates some performance regressions. I reran all reported queries on a single node setup of 19.1.8, 19.2.5, and 20.1.0.beta-4-dirty and observed the regressions on most of the queries (all times are in seconds).
Note that query 6 runs in about the same time in 20.1.0 as in 19.2.5 if we setThis has been addressed by #47365, and I updated the run times.distsql=on
. Further investigation is needed for why the behavior ofdistsql=auto
has changed between 19.2.5 and 20.1.I have not looked into any other regressions.
The repro instructions can be found in our Apollo slack channel.
The text was updated successfully, but these errors were encountered: