-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add virtual table pushdown infrastructure, and add to several spots #47316
Conversation
❌ The GitHub CI (Cockroach) build has failed on 24d8b39f. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool.
It is unfortunate that we are starting to duplicate a lot of the opt code related to Scan. Ideally we should get rid of VirtualScan
and just use Scan
.. but we need to have some kind of primary index for all virtual tables in order to do that. Is there a problem with making up a bogus hidden PK column that isn't visible from queries?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)
pkg/sql/opt/xform/coster.go, line 318 at r1 (raw file):
rowCount := memo.Cost(scan.Relational().Stats.RowCount) // Add a small cost if the scan is unconstrained, so all else being equal, we
I don't think this is the right way to do this. We should instead modify the stats code for virtual scan to reduce the row count if there's a constraint:
https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L634
Can probably reuse most of the code in buildScan
.
pkg/sql/opt/xform/custom_funcs.go, line 124 at r1 (raw file):
// IsCanonicalVirtualScan returns true if the given VirtualScanPrivate is an original // unaltered primary index VirtualScan operator (i.e. unconstrained and not limited).
VirtualScan can't be limited
pkg/sql/opt/xform/custom_funcs.go, line 392 at r1 (raw file):
// done once per table, GenerateConstrainedVirtualScans should only be called on the // original unaltered primary index VirtualScan operator (i.e. not constrained or // limited).
VirtualScan can't be limited
Thanks for the review. Yeah, I agree there's too much duplication. I think I tried to unify when I started this patch but ran into trouble... I can try looking at it again. |
I can give that a shot too. |
That would be great if you could try! |
24d8b39
to
f21655e
Compare
❌ The GitHub CI (Cockroach) build has failed on f21655eb. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
f21655e
to
7007009
Compare
❌ The GitHub CI (Cockroach) build has failed on 70070091. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
7007009
to
ffae5e1
Compare
❌ The GitHub CI (Cockroach) build has failed on ffae5e16. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
ffae5e1
to
3038ae9
Compare
❌ The GitHub CI (Cockroach) build has failed on 3038ae9c. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
3038ae9
to
5360f5e
Compare
Ok, this is finally RFAL! |
Note that it still just works on the |
pkg/sql/information_schema.go
Outdated
addRow func(...tree.Datum) error) (bool, error) { | ||
// This index is on the TABLE_NAME column. | ||
name := tree.MustBeDString(constraint) | ||
fmt.Println(p.CurrentSearchPath(), p.CurrentSearchPath().Iter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
populate: func(ctx context.Context, constraint tree.Datum, p *planner, db *DatabaseDescriptor, | ||
addRow func(...tree.Datum) error) (bool, error) { | ||
// This index is on the TABLE_NAME column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there was a way to visually link the index's definition to the index name in the create statement.
statement ok | ||
CREATE TABLE hidden_in_vtable_index_test(a int) | ||
|
||
let $testid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
til you could do this
CREATE DATABASE other_db; SET DATABASE = other_db | ||
|
||
query O | ||
SELECT oid FROM pg_class WHERE oid=$testid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to enable searching across databases that you can check here?
pkg/sql/opt/xform/coster.go
Outdated
@@ -611,7 +611,8 @@ func (c *coster) rowScanCost(tabID opt.TableID, idxOrd int, numScannedCols int) | |||
// Adjust cost based on how well the current locality matches the index's | |||
// zone constraints. | |||
var costFactor memo.Cost = cpuCostFactor | |||
if len(c.locality.Tiers) != 0 { | |||
if tab.IsVirtualTable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the reason for setting up the if this way
pkg/sql/pg_catalog.go
Outdated
} | ||
|
||
// makeAllRelationsPGCatalogTable creates a virtual table that searches through | ||
// all table descriptors in the system. It automatically adds an index to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It automatically adds a virtual index implementation to the table id column.
I think you want to specify because the user has to actually write the index definition in the create statement themselves
pkg/sql/pg_catalog.go
Outdated
// all table descriptors in the system. It automatically adds an index to the | ||
// table id column as well. The input schema must have a single INDEX definition | ||
// with a single column, which must be the column that contains the table id. | ||
// indexesAreNotComplete should be set to true if the index constraint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to rename this to like "canKeyByIndex" or something.
pkg/sql/pg_catalog.go
Outdated
// might not be something that can be used to search by. For example, a common | ||
// case is having a table that's keyed by an OID that can either be an index | ||
// or a table OID. In this case, you must pass true for this variable, and if | ||
// the input is an index OID, the index won't be used. Another common case and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gotcha feels like an incomplete thought
for i := 0; i < idxConstraint.Spans.Count(); i++ { | ||
span := idxConstraint.Spans.Get(i) | ||
if span.HasSingleKey(p.EvalContext()) { | ||
matchedFilter = datumToFilter.Compare(p.EvalContext(), span.StartKey().Value(0)) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not an existing helper to see if a datum lives within a span? If not it might be helpful to add and move this out of here.
pkg/sql/pg_catalog.go
Outdated
} | ||
table, err := p.LookupTableByID(ctx, id) | ||
if err != nil { | ||
// No table found, so no rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that by returning in this case we are falling back to the table scan? Also, it feels unsatisfying to have to scan the whole table when an invalid table oid is passed in, but that seems unavoidable with this oid situation.
Additionally, we are guaranteed that these index OID's don't collide with table oid's?
pkg/sql/crdb_internal.go
Outdated
var crdbInternalCreateStmtsTable = virtualSchemaTable{ | ||
comment: `CREATE and ALTER statements for all tables accessible by current user in current database (KV scan)`, | ||
schema: ` | ||
var crdbInternalCreateStmtsTable = makeAllRelationsPGCatalogTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be renamed to something like makeAllRelationsVirtualTableWithIndex
or something, since its definitely going to see use outside of the pg_catalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt and opt_exec_factory parts
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)
pkg/sql/opt_exec_factory.go, line 160 at r4 (raw file):
indexName := "" if indexDesc != nil {
[nit] can this ever be nil
? I don't think we can have optVirtualIndex
with nil desc. Also, could just use index.Name()
pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):
} // Check for explicit use of the dummy column. if needed.Contains(0) || indexConstraint != nil || len(reqOrdering) > 0 || reverse {
I think we should still check if any spans in the the indexConstriaint have any values for the PK column and emit the same error, just in case. I'm thinking about something like SELECT a FROM vtable WHERE a=1 AND dummyColumn=1
, which could conceivably result in a scan that only returns a
(though I don't think it happens right now).
pkg/sql/opt/xform/coster.go, line 614 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
whats the reason for setting up the if this way
+1, either put a TODO in there if we'll want to add more for that case, or make it if !tabIsVirtualTable() || len..
5360f5e
to
1a898c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @rohany from 2 discussions.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)
pkg/sql/crdb_internal.go, line 1383 at r4 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Maybe this should be renamed to something like
makeAllRelationsVirtualTableWithIndex
or something, since its definitely going to see use outside of the pg_catalog
Done.
pkg/sql/information_schema.go, line 1365 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I wish there was a way to visually link the index's definition to the index name in the create statement.
Yeah :( I'm not sure how one would fix this.
pkg/sql/information_schema.go, line 1367 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Doh. I'm the worst with this. I need a pre-push regex.
pkg/sql/opt_catalog.go, line 1627 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I don't understand this case and the below case handling of the dummy PK
This is to set up the column positions - the mapping from index ordinal to column id. The dummy PK column just has to be shunted out of the way, so I put it at the end where nobody can get at it.
pkg/sql/opt_exec_factory.go, line 160 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] can this ever be
nil
? I don't think we can haveoptVirtualIndex
with nil desc. Also, could just useindex.Name()
Yeah, this can be nil - the primary index doesn't have a descriptor, I don't think.
pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):
Previously, RaduBerinde wrote…
I think we should still check if any spans in the the indexConstriaint have any values for the PK column and emit the same error, just in case. I'm thinking about something like
SELECT a FROM vtable WHERE a=1 AND dummyColumn=1
, which could conceivably result in a scan that only returnsa
(though I don't think it happens right now).
Done.
pkg/sql/pg_catalog.go, line 993 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
- It automatically adds a virtual index implementation to the table id column.
I think you want to specify because the user has to actually write the index definition in the create statement themselves
Done.
pkg/sql/pg_catalog.go, line 996 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I think it would be clearer to rename this to like "canKeyByIndex" or something.
Done - I thought it had to be more general, but you're right.
pkg/sql/pg_catalog.go, line 1000 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
This gotcha feels like an incomplete thought
I removed this.
pkg/sql/pg_catalog.go, line 1039 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
you want to be checking if it is an UndefinedRelationErr before doing this -- it could be a wide variety of other errors that you're swallowing here.
Great point, thanks
pkg/sql/pg_catalog.go, line 1040 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Can you add that by returning in this case we are falling back to the table scan? Also, it feels unsatisfying to have to scan the whole table when an invalid table oid is passed in, but that seems unavoidable with this oid situation.
Additionally, we are guaranteed that these index OID's don't collide with table oid's?
We're not guaranteed that :/ but this patch doesn't make it worse. We do have to figure out how to deal with this at some point. It does seem unlikely they'll collide given its a 64 bit hash, but yeah, it's really bad.
pkg/sql/virtual_schema.go, line 399 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Is there not an existing helper to see if a datum lives within a span? If not it might be helpful to add and move this out of here.
I don't think so... @RaduBerinde what do you think, should I make this a helper? Where would it live?
pkg/sql/logictest/testdata/logic_test/pg_catalog, line 2197 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Is there a way to enable searching across databases that you can check here?
There's no way to search across databases in virtual tables.
❌ The GitHub CI (Cockroach) build has failed on 1a898c3c. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
1a898c3
to
4386d16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @RaduBerinde, and @rohany)
pkg/sql/opt_catalog.go, line 1327 at r7 (raw file):
ot.indexes[i].init(ot) if i == 0 { // Nothing special happens for the synthesized PK.
[nit] No reason for this to be inside the loop, I'd just do it separately, and then range
over ot.desc.Indexes
pkg/sql/opt_catalog.go, line 1579 at r7 (raw file):
// ID is part of the cat.Index interface. func (oi *optVirtualIndex) ID() cat.StableID {
These methods must work for the primary index as well, at least in principle.
pkg/sql/opt_exec_factory.go, line 160 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Yeah, this can be nil - the primary index doesn't have a descriptor, I don't think.
Ah, right. I still think we should make Name()
return something reasonable for the primary index and use that directly.
pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Done.
Hm.. It's a little more tricky than this - if we have a non-unique index, the PK will always be in Columns - what we'd need to check is that all Span keys only have values for the first (non-PK) column(s). This sounds too onerous. So I guess forget my comment, one would only be able to get in trouble if they're deliberately trying to :)
4386d16
to
e2e570a
Compare
❌ The GitHub CI (Cockroach) build has failed on e2e570af. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
e2e570a
to
5ffc3d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rohany)
pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):
Previously, RaduBerinde wrote…
Hm.. It's a little more tricky than this - if we have a non-unique index, the PK will always be in Columns - what we'd need to check is that all Span keys only have values for the first (non-PK) column(s). This sounds too onerous. So I guess forget my comment, one would only be able to get in trouble if they're deliberately trying to :)
Ok, reverted :)
pkg/sql/opt/xform/coster.go, line 614 at r3 (raw file):
Previously, RaduBerinde wrote…
+1, either put a TODO in there if we'll want to add more for that case, or make it
if !tabIsVirtualTable() || len..
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but lets figure out whether to pull out this datum in constraint span code into helper in a followup pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo, tftrs! Yes, I'll think about how to better structure that code.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rohany)
Build failed (retrying...) |
Build failed |
This commit adds virtual table "indexes", a concept that allows virtual tables to specify columns that can be accelerated given an equality constraint using custom code. For example, the information_schema.tables table contains one entry per table in the database. Before this commit, retrieving the row for a table 'foo' would require fetching the descriptors for all tables and filtering for the one that had name 'foo'. Now, if there's a constraint on the table name, an "index" is used that converts the name into a descriptor with a single lookup. This commit also adds virtual indexes to a few commonly-used virtual tables. There is much, much more that could be done - the ones that are added here are just examples. Release note (performance improvement): filtered scans over virtual tables have improved performance in common cases.
This commit updates the implementation of SHOW CREATE so that it doesn't have to filter every single table in the database. It does 2 things: 1. Stop outputting zone configs in crdb_internal.create_statements 2. Start outputting partitioning status in the above instead 3. Get zone configs from the SQL query in SHOW CREATE with a subquery from the zone configs table and join together in SQL This strategy makes it possible to give crdb_internal.create_statements a virtual index. Now SHOW CREATE is really fast! Release note (performance improvement): SHOW CREATE is much more efficient
5ffc3d9
to
c44619c
Compare
bors r+ |
Build succeeded |
See individual commits for details. This basically adds the infrastructure to the optimizer and execution engine for pushdown, and adds virtual indexes to a few choice tables.
The second commit rearranges how SHOW CREATE TABLE is done so that it can use a virtual index. Now it's no longer O(n) in the number of tables, hooray!
This is still a WIP - there are some things that need cleaning / more comments, and I think the optimizer part is still probably not complete. Also, I need to figure out how to write optimizer tests for the virtual indexes. I tried to add support to the opt catalog for them but ran into some weird issues. I'm pushing this up so we can start making progress on it as a team - I've been sitting on it for too long.