Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

Closed
msirek opened this issue Dec 12, 2022 · 0 comments · Fixed by #93858 or #94672
Closed

Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY #93410

msirek opened this issue Dec 12, 2022 · 0 comments · Fixed by #93858 or #94672
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@msirek
Copy link
Contributor

msirek commented Dec 12, 2022

Describe the problem
In support issue https://github.com/cockroachlabs/support/issues/1949, a hash group-by + TopK plan is chosen over a streaming group-by when there is a LIMIT on the query because of unique constraints on certain columns, causing the number of grouping columns to be reduced by the optimizer. When one of the unique indexes is dropped, the fast query plan is picked.

To Reproduce
Use the statement bundles in the support issue.

Jira issue: CRDB-22320

@msirek msirek added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Dec 12, 2022
@msirek msirek self-assigned this Dec 12, 2022
@msirek msirek closed this as completed Dec 13, 2022
@msirek msirek changed the title Propagate limit hints for empty sort enforcers Suboptimal query plan on grouped aggregation with LIMIT and ORDER BY Dec 18, 2022
@msirek msirek reopened this Dec 18, 2022
msirek pushed a commit to msirek/cockroach that referenced this issue Dec 18, 2022
Fixes cockroachdb#93410

A query with a grouped aggregation, a LIMIT and an ORDER BY
may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of
grouping columns may be reduced during normalization via rule
ReduceGroupingCols such that it no longer includes columns
present in the ORDER BY clause. This eliminates possibly cheap
plans from consideration, for example, if the input to the
aggregation is a lookup join, it may be cheaper to sort the
input to the lookup join on the ORDER BY columns if they overlap
with the grouping columns, so that a streaming group-by with no
TopK operator can be used, and a full scan of the inputs to
the join is avoided.

This fix adds a new exploration rule which ensures that a grouped
aggregation with a LIMIT and ORDER BY clause considers using
streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving
tables with unique constraints using LIMIT, GROUP BY and ORDER BY
clauses to ensure the optimizer considers streaming group-by with
no TopK operation, when possible. This is often the most efficient
query plan.
msirek pushed a commit to msirek/cockroach that referenced this issue Dec 20, 2022
Fixes cockroachdb#93410

A query with a grouped aggregation, a LIMIT and an ORDER BY
may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of
grouping columns may be reduced during normalization via rule
ReduceGroupingCols such that it no longer includes columns
present in the ORDER BY clause. This eliminates possibly cheap
plans from consideration, for example, if the input to the
aggregation is a lookup join, it may be cheaper to sort the
input to the lookup join on the ORDER BY columns if they overlap
with the grouping columns, so that a streaming group-by with no
TopK operator can be used, and a full scan of the inputs to
the join is avoided.

This fix adds a new exploration rule which ensures that a grouped
aggregation with a LIMIT and ORDER BY clause considers using
streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving
tables with unique constraints using LIMIT, GROUP BY and ORDER BY
clauses to ensure the optimizer considers streaming group-by with
no TopK operation, when possible. This is often the most efficient
query plan.
msirek pushed a commit to msirek/cockroach that referenced this issue Dec 20, 2022
Fixes cockroachdb#93410

A query with a grouped aggregation, a LIMIT and an ORDER BY
may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of
grouping columns may be reduced during normalization via rule
ReduceGroupingCols such that it no longer includes columns
present in the ORDER BY clause. This eliminates possibly cheap
plans from consideration, for example, if the input to the
aggregation is a lookup join, it may be cheaper to sort the
input to the lookup join on the ORDER BY columns if they overlap
with the grouping columns, so that a streaming group-by with no
TopK operator can be used, and a full scan of the inputs to
the join is avoided.

This fix adds a new exploration rule which ensures that a grouped
aggregation with a LIMIT and ORDER BY clause considers using
streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving
tables with unique constraints using LIMIT, GROUP BY and ORDER BY
clauses to ensure the optimizer considers streaming group-by with
no TopK operation, when possible. This is often the most efficient
query plan.
craig bot pushed a commit that referenced this issue Dec 20, 2022
93858: xform: use ordering from LIMIT as a hint for streaming group-by r=DrewKimball,rharding6373 a=msirek

Fixes #93410

A query with a grouped aggregation, a LIMIT and an ORDER BY may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of grouping columns may be reduced during normalization via rule ReduceGroupingCols such that it no longer includes columns present in the ORDER BY clause. This eliminates possibly cheap plans from consideration, for example, if the input to the aggregation is a lookup join, it may be cheaper to sort the input to the lookup join on the ORDER BY columns if they overlap with the grouping columns, so that a streaming group-by with no TopK operator can be used, and a full scan of the inputs to the join is avoided.

This fix adds a new exploration rule which ensures that a grouped aggregation with a LIMIT and ORDER BY clause considers using streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving tables with unique constraints using LIMIT, GROUP BY and ORDER BY clauses to ensure the optimizer considers streaming group-by with no TopK operation, when possible. This is often the most efficient query plan.

93992: cli,workload: don't hide useful workloads r=j82w a=knz

Prior to this patch, at least half of the workloads were hidden from view in the output of `cockroach --help`.

There was no good reason for this: most of the workloads are useful for teaching/learning and for experimentation. They all deserve more exposure, so that folk can learn about them without being told by the one person who built the workload in the first place.

So this patch fixes that by exposing of all of them through the online help.

One question that could remain is how much teaching value there is in letting someone experiment with a tool that was built for the benefit of one team only. One specific workload is under consideration here: `bulkingest`, used for benchmarking inside the D&R team, does not really do anything akin to what an end-user would possibly expect to do with a database. For that workload, and the benefit of future workloas akin to it, this patch adds a notice in its help text that it was developed for internal testing only.

Release note: None
Epic: None

94005: opt/partialidx: fix Implicator benchmark r=mgartner a=mgartner

This commit removes some leftover ordinal column references in `partialidx.Implicator`'s benchmarks. These benchmarks are failing since #93754 which deprecates ordinal column references.

Epic: None

Release note: None

Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
@craig craig bot closed this as completed in 04eb457 Dec 20, 2022
msirek pushed a commit to msirek/cockroach that referenced this issue Jan 3, 2023
Fixes cockroachdb#93410

A query with a grouped aggregation, a LIMIT and an ORDER BY
may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of
grouping columns may be reduced during normalization via rule
ReduceGroupingCols such that it no longer includes columns
present in the ORDER BY clause. This eliminates possibly cheap
plans from consideration, for example, if the input to the
aggregation is a lookup join, it may be cheaper to sort the
input to the lookup join on the ORDER BY columns if they overlap
with the grouping columns, so that a streaming group-by with no
TopK operator can be used, and a full scan of the inputs to
the join is avoided.

This fix adds a new exploration rule which ensures that a grouped
aggregation with a LIMIT and ORDER BY clause considers using
streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving
tables with unique constraints using LIMIT, GROUP BY and ORDER BY
clauses to ensure the optimizer considers streaming group-by with
no TopK operation, when possible. This is often the most efficient
query plan.
msirek pushed a commit to msirek/cockroach that referenced this issue Jan 3, 2023
…setting

Fixes cockroachdb#93410

This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by`
session setting that defaults to `true`. When `true`, an exploration
rule which uses the ordering specified in a
`SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an
interesting ordering to require from the input to the group-by
expression, possibly eliminating a top-k operation.  When `false`, the
exploration rule is disabled.

Release note: None
msirek pushed a commit to msirek/cockroach that referenced this issue Jan 5, 2023
…setting

Fixes cockroachdb#93410

This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by`
session setting that defaults to `true`. When `true`, an exploration
rule which uses the ordering specified in a
`SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an
interesting ordering to require from the input to the group-by
expression, possibly eliminating a top-k operation.  When `false`, the
exploration rule is disabled.

Release note: None
craig bot pushed a commit that referenced this issue Jan 5, 2023
94672: sql: add optimizer_use_limit_ordering_for_streaming_group_by session setting r=msirek a=msirek

Fixes #93410

This commit adds the `optimizer_use_limit_ordering_for_streaming_group_by` session setting that defaults to `true`. When `true`, an exploration rule which uses the ordering specified in a
`SELECT ... GROUP BY ... ORDER BY ... LIMIT n;` statement as an interesting ordering to require from the input to the group-by expression, possibly eliminating a top-k operation.  When `false`, the exploration rule is disabled.

Release note: None

Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
msirek pushed a commit to msirek/cockroach that referenced this issue Jan 5, 2023
Fixes cockroachdb#93410

A query with a grouped aggregation, a LIMIT and an ORDER BY
may not always explore the best-cost query plan.

Due to the existence of unique constraints on a table, the set of
grouping columns may be reduced during normalization via rule
ReduceGroupingCols such that it no longer includes columns
present in the ORDER BY clause. This eliminates possibly cheap
plans from consideration, for example, if the input to the
aggregation is a lookup join, it may be cheaper to sort the
input to the lookup join on the ORDER BY columns if they overlap
with the grouping columns, so that a streaming group-by with no
TopK operator can be used, and a full scan of the inputs to
the join is avoided.

This fix adds a new exploration rule which ensures that a grouped
aggregation with a LIMIT and ORDER BY clause considers using
streaming group-by with no TopK when possible.

Release note (bug fix): This patch fixes join queries involving
tables with unique constraints using LIMIT, GROUP BY and ORDER BY
clauses to ensure the optimizer considers streaming group-by with
no TopK operation, when possible. This is often the most efficient
query plan.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
1 participant