Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

msirek · 2022-12-12T07:40:10Z

Describe the problem
In support issue https://github.com/cockroachlabs/support/issues/1949, a hash group-by + TopK plan is chosen over a streaming group-by when there is a LIMIT on the query because of unique constraints on certain columns, causing the number of grouping columns to be reduced by the optimizer. When one of the unique indexes is dropped, the fast query plan is picked.

To Reproduce
Use the statement bundles in the support issue.

Jira issue: CRDB-22320

Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.

93858: xform: use ordering from LIMIT as a hint for streaming group-by r=DrewKimball,rharding6373 a=msirek Fixes #93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan. 93992: cli,workload: don't hide useful workloads r=j82w a=knz Prior to this patch, at least half of the workloads were hidden from view in the output of `cockroach --help`. There was no good reason for this: most of the workloads are useful for teaching/learning and for experimentation. They all deserve more exposure, so that folk can learn about them without being told by the one person who built the workload in the first place. So this patch fixes that by exposing of all of them through the online help. One question that could remain is how much teaching value there is in letting someone experiment with a tool that was built for the benefit of one team only. One specific workload is under consideration here: `bulkingest`, used for benchmarking inside the D&R team, does not really do anything akin to what an end-user would possibly expect to do with a database. For that workload, and the benefit of future workloas akin to it, this patch adds a notice in its help text that it was developed for internal testing only. Release note: None Epic: None 94005: opt/partialidx: fix Implicator benchmark r=mgartner a=mgartner This commit removes some leftover ordinal column references in `partialidx.Implicator`'s benchmarks. These benchmarks are failing since #93754 which deprecates ordinal column references. Epic: None Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>

Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.

…setting Fixes cockroachdb#93410 This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `true`. When `true`, an exploration rule which uses the ordering specified in a `SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation. When `false`, the exploration rule is disabled. Release note: None

94672: sql: add optimizer_use_limit_ordering_for_streaming_group_by session setting r=msirek a=msirek Fixes #93410 This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `true`. When `true`, an exploration rule which uses the ordering specified in a `SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation. When `false`, the exploration rule is disabled. Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>

Fixes cockroachdb#93410 A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan. Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided. This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible. Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.

msirek added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Dec 12, 2022

msirek self-assigned this Dec 12, 2022

msirek closed this as completed Dec 13, 2022

msirek changed the title ~~Propagate limit hints for empty sort enforcers~~ Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY Dec 18, 2022

msirek reopened this Dec 18, 2022

msirek mentioned this issue Dec 18, 2022

xform: use ordering from LIMIT as a hint for streaming group-by #93858

Merged

craig bot closed this as completed in 04eb457 Dec 20, 2022

msirek mentioned this issue Jan 3, 2023

release-22.2: xform: use ordering from LIMIT as a hint for streaming group-by #94603

Merged

msirek mentioned this issue Jan 3, 2023

sql: add optimizer_use_limit_ordering_for_streaming_group_by session setting #94672

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

msirek commented Dec 12, 2022 •

edited

Loading

Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

Comments

msirek commented Dec 12, 2022 • edited Loading

msirek commented Dec 12, 2022 •

edited

Loading