-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-sql-queries
SQL Queries Team
Comments
msirek
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-sql-queries
SQL Queries Team
labels
Dec 12, 2022
msirek
changed the title
Propagate limit hints for empty sort enforcers
Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY
Dec 18, 2022
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Dec 18, 2022
Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Dec 20, 2022
Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Dec 20, 2022
Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
craig bot
pushed a commit
that referenced
this issue
Dec 20, 2022
93858: xform: use ordering from LIMIT as a hint for streaming group-by r=DrewKimball,rharding6373 a=msirek Fixes #93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan. 93992: cli,workload: don't hide useful workloads r=j82w a=knz Prior to this patch, at least half of the workloads were hidden from view in the output of `cockroach --help`. There was no good reason for this: most of the workloads are useful for teaching/learning and for experimentation. They all deserve more exposure, so that folk can learn about them without being told by the one person who built the workload in the first place. So this patch fixes that by exposing of all of them through the online help. One question that could remain is how much teaching value there is in letting someone experiment with a tool that was built for the benefit of one team only. One specific workload is under consideration here: `bulkingest`, used for benchmarking inside the D&R team, does not really do anything akin to what an end-user would possibly expect to do with a database. For that workload, and the benefit of future workloas akin to it, this patch adds a notice in its help text that it was developed for internal testing only. Release note: None Epic: None 94005: opt/partialidx: fix Implicator benchmark r=mgartner a=mgartner This commit removes some leftover ordinal column references in `partialidx.Implicator`'s benchmarks. These benchmarks are failing since #93754 which deprecates ordinal column references. Epic: None Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Jan 3, 2023
Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Jan 3, 2023
…setting Fixes cockroachdb#93410 This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `true`. When `true`, an exploration rule which uses the ordering specified in a `SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation. When `false`, the exploration rule is disabled. Release note: None
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Jan 5, 2023
…setting Fixes cockroachdb#93410 This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `true`. When `true`, an exploration rule which uses the ordering specified in a `SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation. When `false`, the exploration rule is disabled. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jan 5, 2023
94672: sql: add optimizer_use_limit_ordering_for_streaming_group_by session setting r=msirek a=msirek Fixes #93410 This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `true`. When `true`, an exploration rule which uses the ordering specified in a `SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation. When `false`, the exploration rule is disabled. Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
msirek
pushed a commit
to msirek/cockroach
that referenced
this issue
Jan 5, 2023
Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-sql-queries
SQL Queries Team
Describe the problem
In support issue https://github.com/cockroachlabs/support/issues/1949, a hash group-by + TopK plan is chosen over a streaming group-by when there is a LIMIT on the query because of unique constraints on certain columns, causing the number of grouping columns to be reduced by the optimizer. When one of the unique indexes is dropped, the fast query plan is picked.
To Reproduce
Use the statement bundles in the support issue.
Jira issue: CRDB-22320
The text was updated successfully, but these errors were encountered: