-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: prefer sorting fewer columns #60469
Conversation
7b9cf1d
to
3e7e0a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 30 of 30 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)
pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):
│ ├── sort │ │ ├── columns: k:1!null v:2 w:3 │ │ ├── ordering: -2
Are we losing an opportunity for a streaming groupby by removing the sort here? Or does it need to be sorted by both grouping columns?
Currently, if we have to sort results and project a new column, there is no cost difference between the two orders and we happen to prefer the sort on top. It is preferable to sort before adding new columns to avoid storing the extra value in memory or on disk. This change improves the sort costing by adding a cost proportional to the total number of values. Fixes cockroachdb#32952.
3e7e0a5
to
c140110
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Are we losing an opportunity for a streaming groupby by removing the sort here? Or does it need to be sorted by both grouping columns?
This is a "build" test so this wouldn't be the final plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/opt/optbuilder/testdata/limit, line 150 at r1 (raw file):
Previously, RaduBerinde wrote…
This is a "build" test so this wouldn't be the final plan.
got it - thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Seems like a high leverage change.
I noticed that ordering of some group-by + sort operations changed, like here. I think this is ok, but I'm curious: does group-by take advantage its input being already sorted by the group-by columns? Or, could group-by produce rows in order of the group by column, eliminating the need for an additional sort afterwards?
Reviewed 30 of 30 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example you linked to is a build
test so we don't expect a good plan.
Group-by does take advantage of sorted input (in which case it does preserve the ordering; hence the "old" plan you linked to). But that doesn't mean that it's always better to sort before - if the group-by reduces the number of rows a lot, it's better to sort after. (this is predicated on "unordered" groupby being cheaper than "ordered" groupby plus a sort; if it wasn't, we would use ordered groupbys in all cases)
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for the explanation!
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)
Nice - it's also worth noting that in vectorized an n-column sort is done in n passes. So this helps CPU too even without considering memory/disk. |
When you say n-column sort, is |
Ah, I understand. Nevermind, it's about the key columns. |
There is an interesting point there though - the sort costing is inspired by how the row processor does it, we should update it to better reflect the vectorized engine. |
TFTRs! bors r+ |
Build succeeded: |
Currently, if we have to sort results and project a new column, there
is no cost difference between the two orders and we happen to prefer
the sort on top. It is preferable to sort before adding new columns to
avoid storing the extra value in memory or on disk.
This change improves the sort costing by adding a cost proportional to
the total number of values.
Fixes #32952.