opt: prefer sorting fewer columns #60469

RaduBerinde · 2021-02-11T06:08:35Z

Currently, if we have to sort results and project a new column, there
is no cost difference between the two orders and we happen to prefer
the sort on top. It is preferable to sort before adding new columns to
avoid storing the extra value in memory or on disk.

This change improves the sort costing by adding a cost proportional to
the total number of values.

Fixes #32952.

cockroach-teamcity · 2021-02-11T06:08:43Z

This change is

rytaft

Reviewed 30 of 30 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)

pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):

 │         ├── sort
 │         │    ├── columns: k:1!null v:2 w:3
 │         │    ├── ordering: -2

Are we losing an opportunity for a streaming groupby by removing the sort here? Or does it need to be sorted by both grouping columns?

Currently, if we have to sort results and project a new column, there is no cost difference between the two orders and we happen to prefer the sort on top. It is preferable to sort before adding new columns to avoid storing the extra value in memory or on disk. This change improves the sort costing by adding a cost proportional to the total number of values. Fixes cockroachdb#32952.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)

pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Are we losing an opportunity for a streaming groupby by removing the sort here? Or does it need to be sorted by both grouping columns?

This is a "build" test so this wouldn't be the final plan.

rytaft

Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):

Previously, RaduBerinde wrote…

This is a "build" test so this wouldn't be the final plan.

got it - thanks

mgartner

Nice! Seems like a high leverage change.

I noticed that ordering of some group-by + sort operations changed, like here. I think this is ok, but I'm curious: does group-by take advantage its input being already sorted by the group-by columns? Or, could group-by produce rows in order of the group by column, eliminating the need for an additional sort afterwards?

Reviewed 30 of 30 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

RaduBerinde

The example you linked to is a build test so we don't expect a good plan.

Group-by does take advantage of sorted input (in which case it does preserve the ordering; hence the "old" plan you linked to). But that doesn't mean that it's always better to sort before - if the group-by reduces the number of rows a lot, it's better to sort after. (this is predicated on "unordered" groupby being cheaper than "ordered" groupby plus a sort; if it wasn't, we would use ordered groupbys in all cases)

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

mgartner

👍 Thanks for the explanation!

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

jordanlewis · 2021-02-11T20:23:19Z

Nice - it's also worth noting that in vectorized an n-column sort is done in n passes. So this helps CPU too even without considering memory/disk.

RaduBerinde · 2021-02-11T20:46:20Z

When you say n-column sort, is n the number of key columns, or the total number of columns? (We already take the former into account when costing; this change adds a cost for the latter).

jordanlewis · 2021-02-11T21:05:57Z

Ah, I understand. Nevermind, it's about the key columns.

RaduBerinde · 2021-02-11T22:00:22Z

There is an interesting point there though - the sort costing is inspired by how the row processor does it, we should update it to better reflect the vectorized engine.

RaduBerinde · 2021-02-11T22:24:42Z

TFTRs!

bors r+

craig · 2021-02-11T23:04:39Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde requested review from mgartner and rytaft February 11, 2021 06:08

RaduBerinde requested a review from a team as a code owner February 11, 2021 06:08

RaduBerinde force-pushed the sort-cost branch from 7b9cf1d to 3e7e0a5 Compare February 11, 2021 06:13

rytaft approved these changes Feb 11, 2021

View reviewed changes

RaduBerinde force-pushed the sort-cost branch from 3e7e0a5 to c140110 Compare February 11, 2021 16:59

RaduBerinde commented Feb 11, 2021

View reviewed changes

rytaft approved these changes Feb 11, 2021

View reviewed changes

mgartner approved these changes Feb 11, 2021

View reviewed changes

RaduBerinde commented Feb 11, 2021

View reviewed changes

mgartner approved these changes Feb 11, 2021

View reviewed changes

craig bot merged commit 5db2e58 into cockroachdb:master Feb 11, 2021

RaduBerinde deleted the sort-cost branch February 16, 2021 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: prefer sorting fewer columns #60469

opt: prefer sorting fewer columns #60469

RaduBerinde commented Feb 11, 2021

cockroach-teamcity commented Feb 11, 2021

rytaft left a comment

RaduBerinde left a comment

rytaft left a comment

mgartner left a comment

RaduBerinde left a comment

mgartner left a comment

jordanlewis commented Feb 11, 2021

RaduBerinde commented Feb 11, 2021

jordanlewis commented Feb 11, 2021

RaduBerinde commented Feb 11, 2021

RaduBerinde commented Feb 11, 2021

craig bot commented Feb 11, 2021

opt: prefer sorting fewer columns #60469

opt: prefer sorting fewer columns #60469

Conversation

RaduBerinde commented Feb 11, 2021

cockroach-teamcity commented Feb 11, 2021

rytaft left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

jordanlewis commented Feb 11, 2021

RaduBerinde commented Feb 11, 2021

jordanlewis commented Feb 11, 2021

RaduBerinde commented Feb 11, 2021

RaduBerinde commented Feb 11, 2021

craig bot commented Feb 11, 2021