-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, opt: implements index skip scan #39668
base: master
Are you sure you want to change the base?
Conversation
140aa1f
to
68ce923
Compare
b378d23
to
8cdcbd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that from the optimizer side, the operator should be semantically equivalent to a DistinctOn -> Scan complex, where:
- the distinct columns are a prefix of the index
- the ordering inside Distinct refers only to distinct columns
- the constraints in the scan only cover the distinct columns
Some of these could be relaxed a bit in the future if necessary, but with non-trivial work.
The distsql physical planner should be in charge of adding distinct processors as needed to guarantee these semantics in distributed cases.
Separately, we should fix the restriction of indexJoin
input nodes, now that the local execution code is gone it should be doable. I'm surprised we got this far without fixing that.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm those are good points. Right now the IndexSkipScan
is equivalent to a DistinctOn - > Scan (we add the DistinctOn when building the exec plan). Also yes, the distinct on has to be on the prefix of the index currently.
- the ordering inside Distinct refers only to distinct columns
- the constraints in the scan only cover the distinct columns
The PR doesn't restrict either of these to just the distinct columns actually. Why do we need these restrictions exactly?
There are tests I added that tests IndexSkipScan
s work sanely even when the ordering is on columns that are part of the index but not part of the distinct on. Same thing with the constraints, why do you think the restriction on the distinct cols is necessary?
The distsql physical planner should be in charge of adding distinct processors as needed to guarantee these semantics in distributed cases.
I agree with this. Until this is true though, the DistinctOn
added in the exec build phase is a workaround I'm okay with.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj)
ba76073
to
eb6cf14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR doesn't restrict either of these to just the distinct columns actually. Why do we need these restrictions exactly?
There are tests I added that tests IndexSkipScans work sanely even when the ordering is on columns that are part of the index but not part of the distinct on. Same thing with the constraints, why do you think the restriction on the distinct cols is necessary?
They're not strictly speaking necessary (I did mention that we could relax them), they're just hard to think about and get right IMO. If we have an ordering on other columns, we have to set up the distinct processors with ordered input synchronizers to select the correct row. There may be cases where we're not even selecting those other columns, but we would need to get the values over to the distinct processor.
Constraints may not be a problem, it's more that they normally come from filters which in general can't support on those other columns anyway.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj)
This change adds functionality to support index-skip-scans when only a proper subset of the output columns are part of the columns we do an index-skip scan over. Release note: None
This change introduces index skip scans. To start with, we implement index-skip scans when we have a DistinctOn on a prefix of a scan. Following this change, we can add more uses of this when the DistinctOn has an index-join as its input, when there is no DistinctOn but the prefix of the index has very few distinct values and there is a constraint on the columns after those columns (This would need some more thought but I think it's possible if we do an index skip scan and a cross product with the other bounded column values and use this cross product to do a lookup join with the original table). Release note: None
This change adds an exploration rule that allows index skip scans to be used when we have an index join underneath a `DistinctOn`. Release note: None
Previously we were planning an index skip scan as a regular scan with some other properties. This change separates it out into its own operator. This is good because even though a lot of rules that apply to scans are applicable here too, some of them are difficult to reason about and separating it into a separate operator allows us to audit those rules in isolation before applying them instead of just doing it now. This also helps when creating new rules in the future. Another important note is, we need to do some careful work with IndexSkipScan so the exploration rules works sanely when the data is distributed over multiple nodes. This is important because when each node does an `IndexSkipScan` over its own data, the rows returned by different nodes might conflict and so we need a `DistinctOn` to resolve them. and We add this DistinctOn node in the exec builder phase so the guarantees made by the operator always hold. Finally, this change tries to IndexSkipScan behaves sanely when used with various different orderings. Release note: None
eb6cf14
to
66e78b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair. We could tighten the requirements for this rule a little more than we do in this PR and then loosen it later. For now, I've added support for orderings that aren't part of the DistinctOn
columns as well and have tests exercising behavior in those situations.
I'll defer to the team from here! Thanks for letting me work on such cool projects!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj)
This processor is not used. It was implemented for cockroachdb#24584, but the PR to start planning it stalled since Ridwan left, a year ago (cockroachdb#39668). This guy is marginally in my way, and, worse, it's dead code. Unless someone promises to finish that PR imminently :). Release note: None
51178: sql/rowexec: delete indexSkipTableReader r=andreimatei a=andreimatei This processor is not used. It was implemented for #24584, but the PR to start planning it stalled since Ridwan left, a year ago (#39668). This guy is marginally in my way, and, worse, it's dead code. Unless someone promises to finish that PR imminently :). Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
#51178 deleted the |
Ridwan Sharif seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None
110945: sql: plumb locking durability into Get/Scan/ReverseScan requests r=arulajmani,nvanbenschoten,yuzefovich a=michae2 **execinfrapb: remove IndexSkipTableReaderSpec** The `indexSkipTableReader` was deleted in #51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off #39668 they can bring it back. Release note: None **sql: plumb locking durability into Get/Scan/ReverseScan requests** Building on top of #110201, plumb locking durability from the optimizer to creation of Get/Scan/ReverseScan requests in txnKVFetcher and txnKVStreamer. Also add a version gate for usage of guaranteed-durable locking. Fixes: #100194 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None
This change plans index-skip-scan in the optimizer.
It adds exploration rules to a scan but having them use the
loose index table reader instead of the regular table reader.
This will allow the optimizer to create a far better plan (an
order of magnitude better) for some
Distinct
queries byleveraging the index the scan is on. This will also make it
possible to plan index-skip-scans more often when the
prefix of an index is known to have a small number of distinct
values.