sql/schemachanger: implement COMMENT ON in new schema changer #78025

chengxiong-ruan · 2022-03-17T15:34:24Z

This pr includes 3 commits:
(1) adding a metadata fetcher interface for reading comments from system.comments.
(2) implement COMMENT ON statements within schema changer.
(3) data driven tests.

This pr does not turn on the fully implemented flag for these statements.
But I tested them by hand with the flags on.
I think it'd be better to turn the flags on in a separate pr even the risk is low enough
to make it happen within this pr.

Release justification: not for 22.1
Release note: None

cockroach-teamcity · 2022-03-17T15:42:24Z

This change is

postamar

Nicely done. This is close. I have only one concern, which relates to how you query system.comments, the rest is just little things.

Reviewed 25 of 25 files at r1, 27 of 27 files at r2, 18 of 19 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan)

pkg/sql/descmetadata/metadata_fetcher.go, line 28 at r1 (raw file):

	txn *kv.Txn
	ie  sqlutil.InternalExecutor
}

I don't care much either way but if would have been completely OK in my book to have the existing metadataUpdater implement the DescriptorMetadataFetcher interface. Such as it is, these two structs are going to have the same dependencies anyway.

pkg/sql/descmetadata/metadata_fetcher.go, line 41 at r1 (raw file):

		"SELECT type, sub_id, comment FROM system.comments WHERE object_id=$1",
		objectID,
	)

These point lookups are really going to hurt us at scale. For instance, we should consider a DROP DATABASE CASCADE with a 1000 tables to be perfectly normal. Here this will require 1k+ roundtrips.

So, we need batch lookups instead. On the other hand, we don't want to load the whole comments table in memory either, we want the query results to be capped by a constant upper bound.

It's safe to assume that this table is going to be empty or near-empty most of the time. Perhaps instead of querying the table just for the desired ID, it could do so for the 1000 IDs before and/or after and stuff them in your cache. This is just an idea to get you started, I'm sure you can think of something more sophisticated which will scale well.

pkg/sql/schemachanger/scbuild/builder_state.go, line 686 at r2 (raw file):

	// Use public schema by default.
	if !prefix.ExplicitSchema {
		prefix.SchemaName = "public"

Use the constant it catconstants instead of the "public" literal please.

pkg/sql/schemachanger/scbuild/dependencies.go, line 106 at r2 (raw file):

	// MayResolveIndex looks up an index using a naked index name with database
	// and schema prefix. Resolved index and the owner table name are returned.
	MayResolveIndex(ctx context.Context, indexName tree.Name, prefix tree.ObjectNamePrefix) (catalog.Index, tree.TableName)

Perhaps have this return (catalog.ResolvedObjectPrefix, catalog.TableDescriptor, catalog.Index) instead?

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 24 at r2 (raw file):

	if n.Comment != nil {
		dc.Comment = *n.Comment
	}

I don't think this correctly distinguishes between SET NULL and SET '', though this should be easy to fix.

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 43 at r2 (raw file):

	if len(dc.Comment) > 0 {
		b.Add(dc)
	}

Perhaps it's just me but I somehow feel this code could be made more straightforward. I was able to convince myself it was correct, in the end. Perhaps it would help to distinguish the add and the drop cases more clearly?

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/comment_on.go, line 152 at r2 (raw file):

		IsExistenceOptional: false,
		RequiredPrivilege:   privilege.CREATE,
	}

nit: Since comment resolution params are always the same, consider defining one struct at the top level called commentParams or something.

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/dependencies.go, line 265 at r2 (raw file):

	// ResolveTableWithIndexBestEffort retrieves a table which contains the target index and returns its elements.
	ResolveTableWithIndexBestEffort(indexName tree.Name, prefix tree.ObjectNamePrefix, p ResolveParams) ElementResultSet

Can you put prefix before indexName please? Also why not have this return the elements pertaining to the index instead of the table? It's what we care about ultimately. If you do this, you'll need to rename the method, naturally.

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/process.go, line 62 at r2 (raw file):

	reflect.TypeOf((*tree.CommentOnColumn)(nil)):     {CommentOnColumn, false},
	reflect.TypeOf((*tree.CommentOnIndex)(nil)):      {CommentOnIndex, false},
	reflect.TypeOf((*tree.CommentOnConstraint)(nil)): {CommentOnConstraint, false},

It's time to jump into the deep end of the pool, Chengxiong 😀

Come on, mark these as fully-supported.

pkg/sql/schemachanger/scdecomp/dependencies.go, line 82 at r1 (raw file):

	// GetAllCommentsOnObject fetches all comments related to an object such as a
	// database, a schema or a table.
	GetAllCommentsOnObject(ctx context.Context, objectID int64) (*CommentCache, error)

I'm concerned that returning a CommentCache introduces yet another layer of indirection which in turn introduces more complexity in the "business logic" so to speak, when really this cache is an implementation detail. Would it be possible to have the interface implementation worry about caching instead? This means the CommentCache type wouldn't exist (at least not publicly) and this interface would instead have methods like MayGetDatabaseComment(ctx context.Context, dbID catid.DescID) (comment string, found bool) etc etc.

This pushes the complexity further down, of course, but the schema changer is complex enough for that tradeoff to usually be worth it.

Note: please use catid.DescID types for descriptor IDs.
Note: we have nice cache implementations in pkg/util/cache

pkg/sql/schemachanger/scdeps/build_deps.go, line 328 at r1 (raw file):

// DescriptorMetadataFetcher implements the scbuild.Dependencies interface.
func (d *buildDeps) DescriptorMetadataFetcher() scdecomp.DescriptorMetadataFetcher {
	return d.metadataFetcher

I'm not advocating my way over yours but have you considered instead plumbing an internal executor into the buildDeps and have this method instantiate a DescriptorMetadataFetcher ?

pkg/sql/schemachanger/scdeps/sctestdeps/test_deps.go, line 255 at r2 (raw file):

	}
	return nil, tree.TableName{}
}

The fact that this method implementation largely mirrors that of the non-test implementation indicates that this code should be factored out of the interface into a helper function.

pkg/sql/schemachanger/scdeps/sctestutils/sctestutils.go, line 81 at r1 (raw file):

		planner, /* authAccessor */
		planner, /* astFormatter */
		planner, /* featureChecker */

Thank you for this.

pkg/sql/schemachanger/scop/mutation.go, line 416 at r2 (raw file):

// AddTableComment is used to add a comment to a table.
type AddTableComment struct {

Shouldn't these be named UpsertTableComment instead? This applies to all other comment ops, and the method names in scexec/scmutationexec, etc. According to the postgresql documentation there can be only at most one comment per object.

pkg/sql/schemachanger/screl/attr.go, line 75 at r2 (raw file):

	Comment
	// AttrMax is the largest possible Attr value.
	// Note: add any new enum values before TargetStatus, leave these at the end.

Please observe this comment.

pkg/sql/schemachanger/sctest/cumulative.go, line 223 at r2 (raw file):

		// stages.
		postCommit = 0
		nonRevertible = 0

Nice catch! Shouldn't this be right after n := instead?

pkg/bench/rttanalysis/testdata/benchmark_expectations, line 49 at r4 (raw file):

20,DropView/drop_1_view
23,DropView/drop_2_views
26,DropView/drop_3_views

This is a manifestation of the nasty round-trip performance regression I was talking about in an earlier comment.

pkg/sql/schemachanger/testdata/comment_on, line 180 at r4 (raw file):

delete comment for constraint on #107, constraint id: 2
# end PreCommitPhase
commit transaction #1

I appreciate this test suite but I'm not sure it's particularly interesting. The one in scplan is enough I think.

pkg/sql/schemachanger/scbuild/builder_test.go, line 157 at r2 (raw file):

				t.Fatal("not a supported descriptor metadata statement")
			}
		}

I reviewed this whole switch block and really you should replace all of these cases with just case "setup": and case "build":. As far as I can tell all we ever want out of this is to tdb.Exec some parsed statements. See the data-driven tests in scdecomp for inspiration.

pkg/sql/schemachanger/scplan/plan_test.go, line 106 at r2 (raw file):

					}
				}
				return ""

Similar comment as for the data-driven build tests.

chengxiong-ruan · 2022-04-11T16:27:18Z

pkg/sql/descmetadata/metadata_fetcher.go, line 41 at r1 (raw file):