Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: prefer sorting fewer columns #60469

Merged
merged 1 commit into from
Feb 11, 2021
Merged

Conversation

RaduBerinde
Copy link
Member

Currently, if we have to sort results and project a new column, there
is no cost difference between the two orders and we happen to prefer
the sort on top. It is preferable to sort before adding new columns to
avoid storing the extra value in memory or on disk.

This change improves the sort costing by adding a cost proportional to
the total number of values.

Fixes #32952.

@RaduBerinde RaduBerinde requested a review from a team as a code owner February 11, 2021 06:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 30 of 30 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):

 │         ├── sort
 │         │    ├── columns: k:1!null v:2 w:3
 │         │    ├── ordering: -2

Are we losing an opportunity for a streaming groupby by removing the sort here? Or does it need to be sorted by both grouping columns?

Currently, if we have to sort results and project a new column, there
is no cost difference between the two orders and we happen to prefer
the sort on top. It is preferable to sort before adding new columns to
avoid storing the extra value in memory or on disk.

This change improves the sort costing by adding a cost proportional to
the total number of values.

Fixes cockroachdb#32952.
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Are we losing an opportunity for a streaming groupby by removing the sort here? Or does it need to be sorted by both grouping columns?

This is a "build" test so this wouldn't be the final plan.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):

Previously, RaduBerinde wrote…

This is a "build" test so this wouldn't be the final plan.

got it - thanks

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice! Seems like a high leverage change.

I noticed that ordering of some group-by + sort operations changed, like here. I think this is ok, but I'm curious: does group-by take advantage its input being already sorted by the group-by columns? Or, could group-by produce rows in order of the group by column, eliminating the need for an additional sort afterwards?

Reviewed 30 of 30 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example you linked to is a build test so we don't expect a good plan.

Group-by does take advantage of sorted input (in which case it does preserve the ordering; hence the "old" plan you linked to). But that doesn't mean that it's always better to sort before - if the group-by reduces the number of rows a lot, it's better to sort after. (this is predicated on "unordered" groupby being cheaper than "ordered" groupby plus a sort; if it wasn't, we would use ordered groupbys in all cases)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for the explanation!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)

@jordanlewis
Copy link
Member

Nice - it's also worth noting that in vectorized an n-column sort is done in n passes. So this helps CPU too even without considering memory/disk.

@RaduBerinde
Copy link
Member Author

When you say n-column sort, is n the number of key columns, or the total number of columns? (We already take the former into account when costing; this change adds a cost for the latter).

@jordanlewis
Copy link
Member

Ah, I understand. Nevermind, it's about the key columns.

@RaduBerinde
Copy link
Member Author

There is an interesting point there though - the sort costing is inspired by how the row processor does it, we should update it to better reflect the vectorized engine.

@RaduBerinde
Copy link
Member Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 11, 2021

Build succeeded:

@craig craig bot merged commit 5db2e58 into cockroachdb:master Feb 11, 2021
@RaduBerinde RaduBerinde deleted the sort-cost branch February 16, 2021 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: derived projections shouldn't be pushed past sorts
5 participants