Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/schemachanger: implement COMMENT ON in new schema changer #78025

Conversation

chengxiong-ruan
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan commented Mar 17, 2022

Fixes #77327

This pr includes 3 commits:
(1) adding a metadata fetcher interface for reading comments from system.comments.
(2) implement COMMENT ON statements within schema changer.
(3) data driven tests.

This pr does not turn on the fully implemented flag for these statements.
But I tested them by hand with the flags on.
I think it'd be better to turn the flags on in a separate pr even the risk is low enough
to make it happen within this pr.

Release justification: not for 22.1
Release note: None

@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch from 78e7150 to 60ff881 Compare March 17, 2022 15:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch 9 times, most recently from c63af72 to fc81799 Compare April 1, 2022 16:26
@chengxiong-ruan chengxiong-ruan changed the title (WIP) schemachanger: implement COMMENT ON... sql/schemachanger: implement COMMENT ON in new schema changer Apr 1, 2022
@chengxiong-ruan chengxiong-ruan marked this pull request as ready for review April 1, 2022 16:26
@chengxiong-ruan chengxiong-ruan requested a review from a team April 1, 2022 16:26
@chengxiong-ruan chengxiong-ruan requested a review from a team as a code owner April 1, 2022 16:26
@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch 3 times, most recently from 903dc81 to d996e7e Compare April 5, 2022 00:54
Copy link

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. This is close. I have only one concern, which relates to how you query system.comments, the rest is just little things.

Reviewed 25 of 25 files at r1, 27 of 27 files at r2, 18 of 19 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan)


pkg/sql/descmetadata/metadata_fetcher.go, line 28 at r1 (raw file):

	txn *kv.Txn
	ie  sqlutil.InternalExecutor
}

I don't care much either way but if would have been completely OK in my book to have the existing metadataUpdater implement the DescriptorMetadataFetcher interface. Such as it is, these two structs are going to have the same dependencies anyway.


pkg/sql/descmetadata/metadata_fetcher.go, line 41 at r1 (raw file):

		"SELECT type, sub_id, comment FROM system.comments WHERE object_id=$1",
		objectID,
	)

These point lookups are really going to hurt us at scale. For instance, we should consider a DROP DATABASE CASCADE with a 1000 tables to be perfectly normal. Here this will require 1k+ roundtrips.

So, we need batch lookups instead. On the other hand, we don't want to load the whole comments table in memory either, we want the query results to be capped by a constant upper bound.

It's safe to assume that this table is going to be empty or near-empty most of the time. Perhaps instead of querying the table just for the desired ID, it could do so for the 1000 IDs before and/or after and stuff them in your cache. This is just an idea to get you started, I'm sure you can think of something more sophisticated which will scale well.


pkg/sql/schemachanger/scbuild/builder_state.go, line 686 at r2 (raw file):

	// Use public schema by default.
	if !prefix.ExplicitSchema {
		prefix.SchemaName = "public"

Use the constant it catconstants instead of the "public" literal please.


pkg/sql/schemachanger/scbuild/dependencies.go, line 106 at r2 (raw file):

	// MayResolveIndex looks up an index using a naked index name with database
	// and schema prefix. Resolved index and the owner table name are returned.
	MayResolveIndex(ctx context.Context, indexName tree.Name, prefix tree.ObjectNamePrefix) (catalog.Index, tree.TableName)

Perhaps have this return (catalog.ResolvedObjectPrefix, catalog.TableDescriptor, catalog.Index) instead?


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 24 at r2 (raw file):

	if n.Comment != nil {
		dc.Comment = *n.Comment
	}

I don't think this correctly distinguishes between SET NULL and SET '', though this should be easy to fix.


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 43 at r2 (raw file):

	if len(dc.Comment) > 0 {
		b.Add(dc)
	}

Perhaps it's just me but I somehow feel this code could be made more straightforward. I was able to convince myself it was correct, in the end. Perhaps it would help to distinguish the add and the drop cases more clearly?


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 152 at r2 (raw file):

		IsExistenceOptional: false,
		RequiredPrivilege:   privilege.CREATE,
	}

nit: Since comment resolution params are always the same, consider defining one struct at the top level called commentParams or something.


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/dependencies.go, line 265 at r2 (raw file):

	// ResolveTableWithIndexBestEffort retrieves a table which contains the target index and returns its elements.
	ResolveTableWithIndexBestEffort(indexName tree.Name, prefix tree.ObjectNamePrefix, p ResolveParams) ElementResultSet

Can you put prefix before indexName please? Also why not have this return the elements pertaining to the index instead of the table? It's what we care about ultimately. If you do this, you'll need to rename the method, naturally.


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/process.go, line 62 at r2 (raw file):

	reflect.TypeOf((*tree.CommentOnColumn)(nil)):     {CommentOnColumn, false},
	reflect.TypeOf((*tree.CommentOnIndex)(nil)):      {CommentOnIndex, false},
	reflect.TypeOf((*tree.CommentOnConstraint)(nil)): {CommentOnConstraint, false},

It's time to jump into the deep end of the pool, Chengxiong 😀

Come on, mark these as fully-supported.


pkg/sql/schemachanger/scdecomp/dependencies.go, line 82 at r1 (raw file):

	// GetAllCommentsOnObject fetches all comments related to an object such as a
	// database, a schema or a table.
	GetAllCommentsOnObject(ctx context.Context, objectID int64) (*CommentCache, error)

I'm concerned that returning a CommentCache introduces yet another layer of indirection which in turn introduces more complexity in the "business logic" so to speak, when really this cache is an implementation detail. Would it be possible to have the interface implementation worry about caching instead? This means the CommentCache type wouldn't exist (at least not publicly) and this interface would instead have methods like MayGetDatabaseComment(ctx context.Context, dbID catid.DescID) (comment string, found bool) etc etc.

This pushes the complexity further down, of course, but the schema changer is complex enough for that tradeoff to usually be worth it.

Note: please use catid.DescID types for descriptor IDs.
Note: we have nice cache implementations in pkg/util/cache


pkg/sql/schemachanger/scdeps/build_deps.go, line 328 at r1 (raw file):

// DescriptorMetadataFetcher implements the scbuild.Dependencies interface.
func (d *buildDeps) DescriptorMetadataFetcher() scdecomp.DescriptorMetadataFetcher {
	return d.metadataFetcher

I'm not advocating my way over yours but have you considered instead plumbing an internal executor into the buildDeps and have this method instantiate a DescriptorMetadataFetcher ?


pkg/sql/schemachanger/scdeps/sctestdeps/test_deps.go, line 255 at r2 (raw file):

	}
	return nil, tree.TableName{}
}

The fact that this method implementation largely mirrors that of the non-test implementation indicates that this code should be factored out of the interface into a helper function.


pkg/sql/schemachanger/scdeps/sctestutils/sctestutils.go, line 81 at r1 (raw file):

		planner, /* authAccessor */
		planner, /* astFormatter */
		planner, /* featureChecker */

Thank you for this.


pkg/sql/schemachanger/scop/mutation.go, line 416 at r2 (raw file):

// AddTableComment is used to add a comment to a table.
type AddTableComment struct {

Shouldn't these be named UpsertTableComment instead? This applies to all other comment ops, and the method names in scexec/scmutationexec, etc. According to the postgresql documentation there can be only at most one comment per object.


pkg/sql/schemachanger/screl/attr.go, line 75 at r2 (raw file):

	Comment
	// AttrMax is the largest possible Attr value.
	// Note: add any new enum values before TargetStatus, leave these at the end.

Please observe this comment.


pkg/sql/schemachanger/sctest/cumulative.go, line 223 at r2 (raw file):

		// stages.
		postCommit = 0
		nonRevertible = 0

Nice catch! Shouldn't this be right after n := instead?


pkg/bench/rttanalysis/testdata/benchmark_expectations, line 49 at r4 (raw file):

20,DropView/drop_1_view
23,DropView/drop_2_views
26,DropView/drop_3_views

This is a manifestation of the nasty round-trip performance regression I was talking about in an earlier comment.


pkg/sql/schemachanger/testdata/comment_on, line 180 at r4 (raw file):

delete comment for constraint on #107, constraint id: 2
# end PreCommitPhase
commit transaction #1

I appreciate this test suite but I'm not sure it's particularly interesting. The one in scplan is enough I think.


pkg/sql/schemachanger/scbuild/builder_test.go, line 157 at r2 (raw file):

				t.Fatal("not a supported descriptor metadata statement")
			}
		}

I reviewed this whole switch block and really you should replace all of these cases with just case "setup": and case "build":. As far as I can tell all we ever want out of this is to tdb.Exec some parsed statements. See the data-driven tests in scdecomp for inspiration.


pkg/sql/schemachanger/scplan/plan_test.go, line 106 at r2 (raw file):

					}
				}
				return ""

Similar comment as for the data-driven build tests.

@chengxiong-ruan
Copy link
Contributor Author

pkg/sql/descmetadata/metadata_fetcher.go, line 41 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

These point lookups are really going to hurt us at scale. For instance, we should consider a DROP DATABASE CASCADE with a 1000 tables to be perfectly normal. Here this will require 1k+ roundtrips.

So, we need batch lookups instead. On the other hand, we don't want to load the whole comments table in memory either, we want the query results to be capped by a constant upper bound.

It's safe to assume that this table is going to be empty or near-empty most of the time. Perhaps instead of querying the table just for the desired ID, it could do so for the 1000 IDs before and/or after and stuff them in your cache. This is just an idea to get you started, I'm sure you can think of something more sophisticated which will scale well.

@postamar Good callout. I initially thought this comments table was probably not very popular. But you're right that point lookup is bad in DROP DATABASE CASCADE's case.
I do like the idea reading the first 1000 obj IDs starting from the db/schema/table ID in interest. With that in mind we would need to make the cache more like a cache. Like, for any later cache miss, we read the next batch of 1000 obj IDs starting from the missed ID to refresh the cache(we definitely need to record the id ranges covered by the cache). Though we would only benefit from this if the tables are looped through in ordered when a db or schema is dropped cascaded. Looks like the ordering is guaranteed based on the fact that we're using FastIntSet to store table ids when walking through backrefs of schemas. So I guess that's a proper assumption to make :) ?

@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch 2 times, most recently from f72ba30 to 2774c3e Compare April 13, 2022 18:16
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan and @postamar)


pkg/sql/schemachanger/scbuild/builder_state.go, line 686 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Use the constant it catconstants instead of the "public" literal please.

apologies for my laziness


pkg/sql/schemachanger/scbuild/dependencies.go, line 106 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Perhaps have this return (catalog.ResolvedObjectPrefix, catalog.TableDescriptor, catalog.Index) instead?

good point...it looks more consistent to the others. changed it to what you suggested


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 24 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

I don't think this correctly distinguishes between SET NULL and SET '', though this should be easy to fix.

should be more clear now :)


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 43 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Perhaps it's just me but I somehow feel this code could be made more straightforward. I was able to convince myself it was correct, in the end. Perhaps it would help to distinguish the add and the drop cases more clearly?

I agree with you! I changed it to have more clear branches for drop and add separately. let me know how that feels.


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 152 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: Since comment resolution params are always the same, consider defining one struct at the top level called commentParams or something.

done.


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/dependencies.go, line 265 at r2 (raw file):
👍

why not have this return the elements pertaining to the index instead of the table
I guess no good reason other than I wanted my logic in CommentOnIndex like find the table first, then resolve index. I don't feel strong about this though...


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/process.go, line 62 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

It's time to jump into the deep end of the pool, Chengxiong 😀

Come on, mark these as fully-supported.

sure~


pkg/sql/schemachanger/scdecomp/dependencies.go, line 82 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm concerned that returning a CommentCache introduces yet another layer of indirection which in turn introduces more complexity in the "business logic" so to speak, when really this cache is an implementation detail. Would it be possible to have the interface implementation worry about caching instead? This means the CommentCache type wouldn't exist (at least not publicly) and this interface would instead have methods like MayGetDatabaseComment(ctx context.Context, dbID catid.DescID) (comment string, found bool) etc etc.

This pushes the complexity further down, of course, but the schema changer is complex enough for that tradeoff to usually be worth it.

Note: please use catid.DescID types for descriptor IDs.
Note: we have nice cache implementations in pkg/util/cache

this concern should be addressed now with the new implementation of single layer cache.


pkg/sql/schemachanger/scdeps/build_deps.go, line 328 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm not advocating my way over yours but have you considered instead plumbing an internal executor into the buildDeps and have this method instantiate a DescriptorMetadataFetcher ?

good suggestion, I changed it to what you suggested. it's better now since it make more sense to init a comment cache here.


pkg/sql/schemachanger/scdeps/sctestdeps/test_deps.go, line 255 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

The fact that this method implementation largely mirrors that of the non-test implementation indicates that this code should be factored out of the interface into a helper function.

the logic is factored out into FindTableContainsIndex() helper function.


pkg/sql/schemachanger/scop/mutation.go, line 416 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Shouldn't these be named UpsertTableComment instead? This applies to all other comment ops, and the method names in scexec/scmutationexec, etc. According to the postgresql documentation there can be only at most one comment per object.

yes, indeed :)


pkg/sql/schemachanger/screl/attr.go, line 75 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Please observe this comment.

good catch!


pkg/sql/schemachanger/sctest/cumulative.go, line 223 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Nice catch! Shouldn't this be right after n := instead?

ha yeah, it would have the same effect but probably easier to reason about. moved it there.


pkg/bench/rttanalysis/testdata/benchmark_expectations, line 49 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

This is a manifestation of the nasty round-trip performance regression I was talking about in an earlier comment.

hmm...I'll think about how to address these non-drop-cascade case...


pkg/sql/schemachanger/testdata/comment_on, line 180 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

I appreciate this test suite but I'm not sure it's particularly interesting. The one in scplan is enough I think.

haha..true, removed.


pkg/sql/schemachanger/scbuild/builder_test.go, line 157 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

I reviewed this whole switch block and really you should replace all of these cases with just case "setup": and case "build":. As far as I can tell all we ever want out of this is to tdb.Exec some parsed statements. See the data-driven tests in scdecomp for inspiration.

yeah, lol, I rename it to setup for now, will fix those create-table like things as well in a separate pr. don't want to pollute this pr further :).


pkg/sql/schemachanger/scplan/plan_test.go, line 106 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Similar comment as for the data-driven build tests.

same as the other case.


pkg/sql/descmetadata/metadata_fetcher.go, line 28 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I don't care much either way but if would have been completely OK in my book to have the existing metadataUpdater implement the DescriptorMetadataFetcher interface. Such as it is, these two structs are going to have the same dependencies anyway.

ah yeah, I thought about that and imagined a name for it like metadataAccessor. But I didn't put it in the metadataUpdater because I didn't want to have a thing which give people illusion that it would also update the cache when doing upsert/delete which is not a thing we need, so I think it's better to separate them to avoid confusion. Now that I made it more like a "cache" cache...so probably make more sense to have it as separated.

@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch from 2774c3e to aedd80f Compare April 13, 2022 18:31
@chengxiong-ruan
Copy link
Contributor Author

@postamar I've update the how the comment cache behavior... and explicitly batch load comments for backref schema tables in the builderState.BackReferences(), this should address the drop database/schema cascade pain. But it still read comments for individual tables if you just drop multiple table in one transaction. Maybe that's fine?

@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch 4 times, most recently from baaad30 to 56ba44c Compare April 15, 2022 01:39
…walking

This commit adds a metadata fetcher interface to read comments
from db during descriptor elements walking. An implementation
of the interface is also added and plumbed into the descriptor
decomposition.

Release note: None
@chengxiong-ruan chengxiong-ruan force-pushed the comment-on-in-new-schema-changer branch from 56ba44c to 74de37c Compare April 15, 2022 01:44
This commit implements COMMENT ON with the new schema changer
framework along with a few minor changes required for the
functionality:
(1) Adds `Comment` as an element attribute so that existing comment
can be distinguished from new comment for an upsert.
(2) Add methods to resolve constraint by name.
(3) Add methods to resolve index with best effort given a naked name.

Release note: None
This commit is all about adding/fixing data driven tests.

Release note: None
@chengxiong-ruan
Copy link
Contributor Author

closing this pr in favor of #80459

craig bot pushed a commit that referenced this pull request May 5, 2022
80459: sql/schemachanger: Implement comment on in declarative sc r=chengxiong-ruan a=chengxiong-ruan

Split out from #78025

This commit implements COMMENT ON with the new schema changer
framework.

Release note: None

81060: publish-{provisional,}-artifacts: create arm64 builds for non-release r=rail a=rickystewart

These new binaries will only be built for the non-release case.

Release note: None

Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@chengxiong-ruan chengxiong-ruan deleted the comment-on-in-new-schema-changer branch May 6, 2022 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

schemachanger: implement COMMENT ON ... statements
3 participants