Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/schemachanger: enable declarative schema changer to read comments from db #80380

Merged

Conversation

chengxiong-ruan
Copy link
Contributor

This commit adds a metadata fetcher interface to read comments
from db during descriptor elements walking. An implementation
of the interface is also added and plumbed into the descriptor
decomposition.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@chengxiong-ruan chengxiong-ruan marked this pull request as ready for review April 22, 2022 14:28
@chengxiong-ruan chengxiong-ruan requested review from a team and postamar April 22, 2022 14:28
@chengxiong-ruan
Copy link
Contributor Author

@postamar I decided to break this out from #78025 which I think it's hard enough to review even it was in a separate commit, I found it's really hard to make sure all test pass at each commit because the commits all kinda touching the same tests....

@chengxiong-ruan chengxiong-ruan force-pushed the read-comments-in-new-schema-changer branch from 3fc96d9 to 3016208 Compare April 22, 2022 14:55
Copy link

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for breaking this out. I think you did the right thing, there's a lot here, and it will make the COMMENT ON pull request a lot more straightforward to review.

I have a few objections, nothing too deep so don't wait for me to get back from vacation to merge this after addressing them.

Reviewed 33 of 40 files at r1, 9 of 9 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan)


pkg/sql/descmetadata/comment_cache.go, line 35 at r2 (raw file):

	ie            sqlutil.InternalExecutor
	comments      map[scdecomp.CommentKey]string
	objIDsChecked map[catid.DescID]struct{}

nit: consider using a catalog.DescriptorIDSet instead?


pkg/sql/descmetadata/comment_cache.go, line 110 at r2 (raw file):

		"mf-get-table-comments",
		mf.txn,
		sessiondata.InternalExecutorOverride{User: security.RootUserName()},

Just curious, why do we need to query this table as root? Is this how it's done in the legacy schema changer? (I haven't checked)


pkg/sql/schemachanger/scbuild/builder_state.go, line 432 at r2 (raw file):

			}
		case catalog.Schema:
			if err := b.commentCache.LoadCommentsForObjects(b.ctx, catalog.Table, c.backrefs.Ordered()); err != nil {

Schemas also contain types, not just tables. Why does LoadCommentsForObjects require a descriptor type anyway? I don't understand the purpose of this filtering. Let's just warm the cache for all the back-referenced things, no?


pkg/sql/schemachanger/scdecomp/decomp.go, line 63 at r2 (raw file):

	desc catalog.Descriptor,
	lookupFn func(id catid.DescID) catalog.Descriptor,
	ev ElementVisitor,

Consider moving the ElementVisitor definition to this new dependencies.go file, it seems like it would belong there.


pkg/sql/schemachanger/scdecomp/dependencies.go, line 26 at r2 (raw file):

type CommentKey struct {
	ObjectID    catid.DescID
	SubID       descpb.ID

Shouldn't this also be a catid.DescID?


pkg/sql/schemachanger/scdecomp/dependencies.go, line 28 at r2 (raw file):

	SubID       descpb.ID
	CommentType keys.CommentType
}

Why is this type defined here in scdecomp at all?


pkg/sql/schemachanger/scdecomp/dependencies.go, line 32 at r2 (raw file):

// DescriptorCommentCache represent an interface to fetch metadata like
// comment for descriptors.
type DescriptorCommentCache interface {

The descriptor decomposition logic does not care that this is a cache, as far as it's concerned that's purely an implementation detail. I'm aware that's not the case for scbuild though.

What do you think of having an interface here that just does Gets and rename it CommentGetter or something. Then in the builder define another named CommentCache with the LoadCommentsForObjects method and which embeds CommentGetter?


pkg/sql/schemachanger/scdecomp/dependencies.go, line 35 at r2 (raw file):

	// Get returns comment for a (objID, subID, commentType) tuple if exists. "ok"
	// returned indicates if the comment actually exists or not.
	Get(ctx context.Context, objID catid.DescID, subID descpb.ID, commentType keys.CommentType) (comment string, ok bool, err error)

How about replacing this one method with several, one for each comment type?GetDatabaseComment(ctx, dbID) etc.

No need to have the system.comment table schema leak everywhere.


pkg/sql/schemachanger/scdeps/sctestdeps/database_state.go, line 200 at r2 (raw file):

	}
	return comments
}

This is fine but have you considered instead having a comment cache constructor in descmetadata which warms the cache by doing a SELECT * FROM system.comments LIMIT 1000 or something like that?

This way you could re-use it in the test deps (no way we'd ever have more than 1k comments) plus you would amortize a lot of round trips in prod for the common case where there are at most very few comments.


pkg/sql/schemachanger/scplan/plan_test.go, line 60 at r2 (raw file):

		run := func(t *testing.T, d *datadriven.TestData) string {
			// TODO (Chengxiong): make this switch block to have only "build" and
			// "ops/deps" sections.

I suggest you either follow up with a PR shortly or file an issue to track this, otherwise we'll forget about it.


pkg/sql/schemachanger/scplan/plan_test.go, line 106 at r2 (raw file):

					default:
						t.Fatal("not a supported descriptor metadata statement")
					}

Why have a switch block at all? Instead perhaps just tdb.Exec(t, stmt.SQL)?

Same comment applies elsewhere in this PR.

@chengxiong-ruan chengxiong-ruan force-pushed the read-comments-in-new-schema-changer branch 2 times, most recently from 6f3818c to fec7edb Compare April 22, 2022 21:53
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/descmetadata/comment_cache.go, line 35 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: consider using a catalog.DescriptorIDSet instead?

nice, How could I forget that!


pkg/sql/descmetadata/comment_cache.go, line 110 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Just curious, why do we need to query this table as root? Is this how it's done in the legacy schema changer? (I haven't checked)

Yeah, metadata updater and show create table also use root user. I remember I tried something else and got permission denied.


pkg/sql/schemachanger/scbuild/builder_state.go, line 432 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Schemas also contain types, not just tables. Why does LoadCommentsForObjects require a descriptor type anyway? I don't understand the purpose of this filtering. Let's just warm the cache for all the back-referenced things, no?

Good question...my reasoning was that

  1. there's not comment on types, so I'm just being lazy here to treat everything as table. I agree it's confusing and I can add comment for that.
  2. the comment table is indexed starting with commentType.... so I think the better practice is to respect the type first. But this table is probably very small as you said. so type filtering here is not necessary, especially after we add the select limit 1000 thing you suggested :)

pkg/sql/schemachanger/scdecomp/decomp.go, line 63 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Consider moving the ElementVisitor definition to this new dependencies.go file, it seems like it would belong there.

good point, done.


pkg/sql/schemachanger/scdecomp/dependencies.go, line 26 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Shouldn't this also be a catid.DescID?

oops, we actually shouldn't use either since SubID is not for a descriptor. Using a uint32 instead...


pkg/sql/schemachanger/scdecomp/dependencies.go, line 28 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Why is this type defined here in scdecomp at all?

moved to the implementation in descmetadata.


pkg/sql/schemachanger/scdecomp/dependencies.go, line 32 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

The descriptor decomposition logic does not care that this is a cache, as far as it's concerned that's purely an implementation detail. I'm aware that's not the case for scbuild though.

What do you think of having an interface here that just does Gets and rename it CommentGetter or something. Then in the builder define another named CommentCache with the LoadCommentsForObjects method and which embeds CommentGetter?

yeah, that's brilliant~


pkg/sql/schemachanger/scdecomp/dependencies.go, line 35 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

How about replacing this one method with several, one for each comment type?GetDatabaseComment(ctx, dbID) etc.

No need to have the system.comment table schema leak everywhere.

Good idea to have these higher level methods. refactored as suggested.


pkg/sql/schemachanger/scdeps/sctestdeps/database_state.go, line 200 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

This is fine but have you considered instead having a comment cache constructor in descmetadata which warms the cache by doing a SELECT * FROM system.comments LIMIT 1000 or something like that?

This way you could re-use it in the test deps (no way we'd ever have more than 1k comments) plus you would amortize a lot of round trips in prod for the common case where there are at most very few comments.

good idea actually. but here in the test we use the test db runner instead of internal executor, so probably the only thin in common is the query.
But I do like the idea of warming the cache with a 1000 when it's constructed. added that.


pkg/sql/schemachanger/scplan/plan_test.go, line 60 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

I suggest you either follow up with a PR shortly or file an issue to track this, otherwise we'll forget about it.

Yeah... I'm an old school and wrote on my notebook with an actual pen.


pkg/sql/schemachanger/scplan/plan_test.go, line 106 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Why have a switch block at all? Instead perhaps just tdb.Exec(t, stmt.SQL)?

Same comment applies elsewhere in this PR.

that's true...removed.

@chengxiong-ruan chengxiong-ruan force-pushed the read-comments-in-new-schema-changer branch 9 times, most recently from cb52379 to 9b03137 Compare April 24, 2022 14:54
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: only very minor comments nothing blocking

Reviewed 15 of 17 files at r3, 7 of 7 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan and @postamar)


pkg/sql/descmetadata/comment_cache.go line 119 at r4 (raw file):

	}

	var uncheckedObjIDs []catid.DescID

Probably won't be a problem, but you can do uncheckedObjIDs := objIDs[:0], which will keep the same storage length so that you won't have growth if there are more than a 1000 comments.


pkg/sql/descmetadata/comment_cache.go line 199 at r4 (raw file):

	// to avoid boundary cut by `LIMIT 1000`. We might lose 1 object here, but not
	// a big deal in most cases.
	mf.maxWarmedObjID = maxObjID - 1

objIDSChecked isn't updated above? Also, I don't get why we have to subtract 1 here? The check can always do >= maxWarmedObjID?

… from db

This commit adds a metadata fetcher interface to read comments
from db during descriptor elements walking. An implementation
of the interface is also added and plumbed into the descriptor
decomposition.

Release note: None
@chengxiong-ruan chengxiong-ruan force-pushed the read-comments-in-new-schema-changer branch from 9b03137 to 92a78d0 Compare April 25, 2022 15:30
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @fqazi and @postamar)


pkg/sql/descmetadata/comment_cache.go line 119 at r4 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Probably won't be a problem, but you can do uncheckedObjIDs := objIDs[:0], which will keep the same storage length so that you won't have growth if there are more than a 1000 comments.

ah nice trick, changed it to make([]descid.DescID, 0, len(objIDs)) since looks like uncheckedObjIDs := objIDs[:0] would make them share the underlying storage.


pkg/sql/descmetadata/comment_cache.go line 199 at r4 (raw file):

objIDSChecked isn't updated above?

oh, we don't need to update objIDsChecked here because of this logic:

func (mf *metadataCache) shouldLoadFromDB(objID catid.DescID) bool {
	return objID > mf.maxWarmedObjID && !mf.objIDsChecked.Contains(objID)
}

I just didn't want to put a 1000 ids by hand in it lol.

Also, I don't get why we have to subtract 1 here? The check can always do >= maxWarmedObjID?

Yeah, we may do >= maxWarmedObjID as well. My brain model just worked that way somehow haha. The reason to subtract 1 here is just my laziness to handle edge cases where a table object_id has say 10 comments on different sub_ids, but because this table is that last obj we read from db, and we only read 5 comments of it, so we can't say this object_id is fully checked. Yes, it's just me lazy ha.

@chengxiong-ruan
Copy link
Contributor Author

TFTR! @fqazi @postamar
bors r+

@craig
Copy link
Contributor

craig bot commented Apr 25, 2022

Build succeeded:

@craig craig bot merged commit 71928b5 into cockroachdb:master Apr 25, 2022
@chengxiong-ruan chengxiong-ruan deleted the read-comments-in-new-schema-changer branch May 6, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants