Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add virtual table pushdown infrastructure, and add to several spots #47316

Merged
merged 3 commits into from
Apr 25, 2020

Conversation

jordanlewis
Copy link
Member

See individual commits for details. This basically adds the infrastructure to the optimizer and execution engine for pushdown, and adds virtual indexes to a few choice tables.

The second commit rearranges how SHOW CREATE TABLE is done so that it can use a virtual index. Now it's no longer O(n) in the number of tables, hooray!

This is still a WIP - there are some things that need cleaning / more comments, and I think the optimizer part is still probably not complete. Also, I need to figure out how to write optimizer tests for the virtual indexes. I tried to add support to the opt catalog for them but ran into some weird issues. I'm pushing this up so we can start making progress on it as a team - I've been sitting on it for too long.

@jordanlewis jordanlewis requested a review from a team as a code owner April 10, 2020 04:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link

blathers-crl bot commented Apr 10, 2020

❌ The GitHub CI (Cockroach) build has failed on 24d8b39f.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool.

It is unfortunate that we are starting to duplicate a lot of the opt code related to Scan. Ideally we should get rid of VirtualScan and just use Scan.. but we need to have some kind of primary index for all virtual tables in order to do that. Is there a problem with making up a bogus hidden PK column that isn't visible from queries?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/opt/xform/coster.go, line 318 at r1 (raw file):

	rowCount := memo.Cost(scan.Relational().Stats.RowCount)

	// Add a small cost if the scan is unconstrained, so all else being equal, we

I don't think this is the right way to do this. We should instead modify the stats code for virtual scan to reduce the row count if there's a constraint:
https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L634

Can probably reuse most of the code in buildScan.


pkg/sql/opt/xform/custom_funcs.go, line 124 at r1 (raw file):

// IsCanonicalVirtualScan returns true if the given VirtualScanPrivate is an original
// unaltered primary index VirtualScan operator (i.e. unconstrained and not limited).

VirtualScan can't be limited


pkg/sql/opt/xform/custom_funcs.go, line 392 at r1 (raw file):

// done once per table, GenerateConstrainedVirtualScans should only be called on the
// original unaltered primary index VirtualScan operator (i.e. not constrained or
// limited).

VirtualScan can't be limited

@jordanlewis
Copy link
Member Author

Thanks for the review. Yeah, I agree there's too much duplication. I think I tried to unify when I started this patch but ran into trouble... I can try looking at it again.

@RaduBerinde
Copy link
Member

I can give that a shot too.

@jordanlewis
Copy link
Member Author

That would be great if you could try!

@blathers-crl
Copy link

blathers-crl bot commented Apr 15, 2020

❌ The GitHub CI (Cockroach) build has failed on f21655eb.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Apr 20, 2020

❌ The GitHub CI (Cockroach) build has failed on 70070091.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Apr 22, 2020

❌ The GitHub CI (Cockroach) build has failed on ffae5e16.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Apr 23, 2020

❌ The GitHub CI (Cockroach) build has failed on 3038ae9c.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@jordanlewis
Copy link
Member Author

Ok, this is finally RFAL!

@jordanlewis
Copy link
Member Author

Note that it still just works on the populate struct, so not yet integrated with @rohany's latest changes. I don't want to make this PR any bigger though.

addRow func(...tree.Datum) error) (bool, error) {
// This index is on the TABLE_NAME column.
name := tree.MustBeDString(constraint)
fmt.Println(p.CurrentSearchPath(), p.CurrentSearchPath().Iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print

{
populate: func(ctx context.Context, constraint tree.Datum, p *planner, db *DatabaseDescriptor,
addRow func(...tree.Datum) error) (bool, error) {
// This index is on the TABLE_NAME column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was a way to visually link the index's definition to the index name in the create statement.

statement ok
CREATE TABLE hidden_in_vtable_index_test(a int)

let $testid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

til you could do this

CREATE DATABASE other_db; SET DATABASE = other_db

query O
SELECT oid FROM pg_class WHERE oid=$testid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to enable searching across databases that you can check here?

@@ -611,7 +611,8 @@ func (c *coster) rowScanCost(tabID opt.TableID, idxOrd int, numScannedCols int)
// Adjust cost based on how well the current locality matches the index's
// zone constraints.
var costFactor memo.Cost = cpuCostFactor
if len(c.locality.Tiers) != 0 {
if tab.IsVirtualTable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the reason for setting up the if this way

}

// makeAllRelationsPGCatalogTable creates a virtual table that searches through
// all table descriptors in the system. It automatically adds an index to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It automatically adds a virtual index implementation to the table id column.

I think you want to specify because the user has to actually write the index definition in the create statement themselves

// all table descriptors in the system. It automatically adds an index to the
// table id column as well. The input schema must have a single INDEX definition
// with a single column, which must be the column that contains the table id.
// indexesAreNotComplete should be set to true if the index constraint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to rename this to like "canKeyByIndex" or something.

// might not be something that can be used to search by. For example, a common
// case is having a table that's keyed by an OID that can either be an index
// or a table OID. In this case, you must pass true for this variable, and if
// the input is an index OID, the index won't be used. Another common case and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gotcha feels like an incomplete thought

for i := 0; i < idxConstraint.Spans.Count(); i++ {
span := idxConstraint.Spans.Get(i)
if span.HasSingleKey(p.EvalContext()) {
matchedFilter = datumToFilter.Compare(p.EvalContext(), span.StartKey().Value(0)) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not an existing helper to see if a datum lives within a span? If not it might be helpful to add and move this out of here.

}
table, err := p.LookupTableByID(ctx, id)
if err != nil {
// No table found, so no rows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that by returning in this case we are falling back to the table scan? Also, it feels unsatisfying to have to scan the whole table when an invalid table oid is passed in, but that seems unavoidable with this oid situation.

Additionally, we are guaranteed that these index OID's don't collide with table oid's?

var crdbInternalCreateStmtsTable = virtualSchemaTable{
comment: `CREATE and ALTER statements for all tables accessible by current user in current database (KV scan)`,
schema: `
var crdbInternalCreateStmtsTable = makeAllRelationsPGCatalogTable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be renamed to something like makeAllRelationsVirtualTableWithIndex or something, since its definitely going to see use outside of the pg_catalog

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt and opt_exec_factory parts :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/opt_exec_factory.go, line 160 at r4 (raw file):

	indexName := ""
	if indexDesc != nil {

[nit] can this ever be nil? I don't think we can have optVirtualIndex with nil desc. Also, could just use index.Name()


pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):

	}
	// Check for explicit use of the dummy column.
	if needed.Contains(0) || indexConstraint != nil || len(reqOrdering) > 0 || reverse {

I think we should still check if any spans in the the indexConstriaint have any values for the PK column and emit the same error, just in case. I'm thinking about something like SELECT a FROM vtable WHERE a=1 AND dummyColumn=1, which could conceivably result in a scan that only returns a (though I don't think it happens right now).


pkg/sql/opt/xform/coster.go, line 614 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

whats the reason for setting up the if this way

+1, either put a TODO in there if we'll want to add more for that case, or make it if !tabIsVirtualTable() || len..

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @rohany from 2 discussions.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/crdb_internal.go, line 1383 at r4 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Maybe this should be renamed to something like makeAllRelationsVirtualTableWithIndex or something, since its definitely going to see use outside of the pg_catalog

Done.


pkg/sql/information_schema.go, line 1365 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I wish there was a way to visually link the index's definition to the index name in the create statement.

Yeah :( I'm not sure how one would fix this.


pkg/sql/information_schema.go, line 1367 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

print

Doh. I'm the worst with this. I need a pre-push regex.


pkg/sql/opt_catalog.go, line 1627 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I don't understand this case and the below case handling of the dummy PK

This is to set up the column positions - the mapping from index ordinal to column id. The dummy PK column just has to be shunted out of the way, so I put it at the end where nobody can get at it.


pkg/sql/opt_exec_factory.go, line 160 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] can this ever be nil? I don't think we can have optVirtualIndex with nil desc. Also, could just use index.Name()

Yeah, this can be nil - the primary index doesn't have a descriptor, I don't think.


pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):

Previously, RaduBerinde wrote…

I think we should still check if any spans in the the indexConstriaint have any values for the PK column and emit the same error, just in case. I'm thinking about something like SELECT a FROM vtable WHERE a=1 AND dummyColumn=1, which could conceivably result in a scan that only returns a (though I don't think it happens right now).

Done.


pkg/sql/pg_catalog.go, line 993 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…
  • It automatically adds a virtual index implementation to the table id column.

I think you want to specify because the user has to actually write the index definition in the create statement themselves

Done.


pkg/sql/pg_catalog.go, line 996 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I think it would be clearer to rename this to like "canKeyByIndex" or something.

Done - I thought it had to be more general, but you're right.


pkg/sql/pg_catalog.go, line 1000 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

This gotcha feels like an incomplete thought

I removed this.


pkg/sql/pg_catalog.go, line 1039 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

you want to be checking if it is an UndefinedRelationErr before doing this -- it could be a wide variety of other errors that you're swallowing here.

Great point, thanks


pkg/sql/pg_catalog.go, line 1040 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Can you add that by returning in this case we are falling back to the table scan? Also, it feels unsatisfying to have to scan the whole table when an invalid table oid is passed in, but that seems unavoidable with this oid situation.

Additionally, we are guaranteed that these index OID's don't collide with table oid's?

We're not guaranteed that :/ but this patch doesn't make it worse. We do have to figure out how to deal with this at some point. It does seem unlikely they'll collide given its a 64 bit hash, but yeah, it's really bad.


pkg/sql/virtual_schema.go, line 399 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Is there not an existing helper to see if a datum lives within a span? If not it might be helpful to add and move this out of here.

I don't think so... @RaduBerinde what do you think, should I make this a helper? Where would it live?


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 2197 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Is there a way to enable searching across databases that you can check here?

There's no way to search across databases in virtual tables.

@blathers-crl
Copy link

blathers-crl bot commented Apr 23, 2020

❌ The GitHub CI (Cockroach) build has failed on 1a898c3c.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/opt_catalog.go, line 1327 at r7 (raw file):

		ot.indexes[i].init(ot)
		if i == 0 {
			// Nothing special happens for the synthesized PK.

[nit] No reason for this to be inside the loop, I'd just do it separately, and then range over ot.desc.Indexes


pkg/sql/opt_catalog.go, line 1579 at r7 (raw file):

// ID is part of the cat.Index interface.
func (oi *optVirtualIndex) ID() cat.StableID {

These methods must work for the primary index as well, at least in principle.


pkg/sql/opt_exec_factory.go, line 160 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Yeah, this can be nil - the primary index doesn't have a descriptor, I don't think.

Ah, right. I still think we should make Name() return something reasonable for the primary index and use that directly.


pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Done.

Hm.. It's a little more tricky than this - if we have a non-unique index, the PK will always be in Columns - what we'd need to check is that all Span keys only have values for the first (non-PK) column(s). This sounds too onerous. So I guess forget my comment, one would only be able to get in trouble if they're deliberately trying to :)

@blathers-crl
Copy link

blathers-crl bot commented Apr 24, 2020

❌ The GitHub CI (Cockroach) build has failed on e2e570af.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rohany)


pkg/sql/opt_exec_factory.go, line 164 at r4 (raw file):

Previously, RaduBerinde wrote…

Hm.. It's a little more tricky than this - if we have a non-unique index, the PK will always be in Columns - what we'd need to check is that all Span keys only have values for the first (non-PK) column(s). This sounds too onerous. So I guess forget my comment, one would only be able to get in trouble if they're deliberately trying to :)

Ok, reverted :)


pkg/sql/opt/xform/coster.go, line 614 at r3 (raw file):

Previously, RaduBerinde wrote…

+1, either put a TODO in there if we'll want to add more for that case, or make it if !tabIsVirtualTable() || len..

Done.

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but lets figure out whether to pull out this datum in constraint span code into helper in a followup pr

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, tftrs! Yes, I'll think about how to better structure that code.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rohany)

@craig
Copy link
Contributor

craig bot commented Apr 24, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Apr 24, 2020

Build failed

This commit adds virtual table "indexes", a concept that allows virtual
tables to specify columns that can be accelerated given an equality
constraint using custom code.

For example, the information_schema.tables table contains one entry per
table in the database. Before this commit, retrieving the row for a
table 'foo' would require fetching the descriptors for all tables and
filtering for the one that had name 'foo'. Now, if there's a constraint
on the table name, an "index" is used that converts the name into a
descriptor with a single lookup.

This commit also adds virtual indexes to a few commonly-used virtual
tables. There is much, much more that could be done - the ones that are
added here are just examples.

Release note (performance improvement): filtered scans over virtual
tables have improved performance in common cases.
This commit updates the implementation of SHOW CREATE so that it doesn't
have to filter every single table in the database. It does 2 things:

1. Stop outputting zone configs in crdb_internal.create_statements
2. Start outputting partitioning status in the above instead
3. Get zone configs from the SQL query in SHOW CREATE with a subquery
   from the zone configs table and join together in SQL

This strategy makes it possible to give
crdb_internal.create_statements a virtual index.

Now SHOW CREATE is really fast!

Release note (performance improvement): SHOW CREATE is much more
efficient
@jordanlewis
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 25, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants