Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: hook up tenant capabilities subsystem on startup #96390

Merged
merged 1 commit into from
Feb 9, 2023

Conversation

arulajmani
Copy link
Collaborator

@arulajmani arulajmani commented Feb 1, 2023

This patch wires up the tenant capabilities subsystem during server
startup. This includes both starting the subsystem and adding a handle
to the Authorizer so that it can be used by GRPC interceptors to perform
authorization checks.

While working through this patch, I realized we can't instantiate an
Authorizer with a handle to the tenant capability state
(tenantcapabilities.Reader). This is because the afformentioned GRPC
setup happens early on during the Server startup process, and at that
point we do not have access to the dependencies required to setup the
capabilities Watcher (which is what provides the Reader interface to
the Authorizer). To break this dependency cycle, we end up with an
approach to lazily bind the Reader to the Authorizer.

This patch also adds a datadriven framework to test tenant capabilities
end to end. The nice thing about it is it hides the asynchronous nature
of capability checks from test writers. The hope is that we'll be able
to extend this as we add more capabilities.

Informs #94643

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@arulajmani arulajmani force-pushed the tencap-wire-watcher branch 2 times, most recently from 7a761a8 to 31014b6 Compare February 2, 2023 02:30
@arulajmani arulajmani changed the title [WIP, DNM] server: hookup tenant capabilities subsystem on startup server: hookup tenant capabilities subsystem on startup Feb 2, 2023
@arulajmani arulajmani marked this pull request as ready for review February 2, 2023 02:31
@arulajmani arulajmani requested review from a team as code owners February 2, 2023 02:31
@arulajmani arulajmani requested a review from a team February 2, 2023 02:31
@arulajmani arulajmani requested a review from a team as a code owner February 2, 2023 02:31
@arulajmani arulajmani changed the title server: hookup tenant capabilities subsystem on startup server: hook up tenant capabilities subsystem on startup Feb 2, 2023
@arulajmani arulajmani force-pushed the tencap-wire-watcher branch 2 times, most recently from 1edb01e to 08e901a Compare February 2, 2023 04:07
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 38 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @ecwall, and @knz)


pkg/server/server.go line 1925 at r1 (raw file):

		return errors.Wrap(err, "failed to initialize the tenant settings watcher")
	}
	if err := s.node.tenantCapabilitiesWatcher.Start(ctx); err != nil {

I don't understand what the benefit is of having the capabilities watcher attached to node.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @ecwall, and @knz)


pkg/server/server.go line 1925 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I don't understand what the benefit is of having the capabilities watcher attached to node.

I was following the pattern of creating a Watcher during NewServer and starting it here. node felt like the right place to put the Watcher, given it runs on KV nodes. Is there a better alternative than node?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty good.

Reviewed 38 of 38 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @dhartunian, and @ecwall)


pkg/multitenant/tenantcapabilities/capabilities.go line 47 at r1 (raw file):

	// is not allowed to execute the supplied batch request given the capabilities
	// it possesses.
	HasCapabilityForBatch(context.Context, roachpb.TenantID, *roachpb.BatchRequest) error

IMHO we should separate "ok/fail" (a boolean) from an error here. They say different stories.
We've separated them too in e.g. (*server.authenticationServer).verifySession().


pkg/multitenant/tenantcapabilities/capabilities.go line 52 at r1 (raw file):

	// Authorizer post-creation. The Authorizer uses the Reader to consult the
	// global tenant capability state to authorize incoming requests. This
	// function cannot be used to update the Reader; it may be called at-most

I don't fully understand the restriction about "at most once". What bad things happen if you call it multiple times?


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 46 at r1 (raw file):

	}
	if a.capabilitiesReader == nil {
		log.Fatal(ctx, "trying to perform capability check when no Reader exists")

return errors.AssertionFailedf
and if you feel fancy logcrash.ReportOrPanic. log.Fatal is out of style.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 74 at r1 (raw file):

func (a *Authorizer) BindReader(ctx context.Context, reader tenantcapabilities.Reader) {
	if a.capabilitiesReader != nil {
		log.Fatal(ctx, "cannot bind a tenant capabilities reader more than once")

see my previous comment


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go line 37 at r1 (raw file):

// BindReader implements the tenantcapabilities.Authorizer interface.
func (n *NoopAuthorizer) BindReader(context.Context, tenantcapabilities.Reader) {
	panic("disallowed")

I don't understand this -why not?


pkg/rpc/auth_tenant.go line 174 at r1 (raw file):

	// contained in here, there should be a corresponding capability. Once that's
	// done, we can get rid of this loop entirely and perform all checks inside
	// the capabilities Authorizer above.

I don't understand this comment. To me, this entire check should be done inside HasCapabilityForBatch():

  • request types that have a capability for them get a check.
  • others are allowed/denied depending on the chosen default behavior.

pkg/rpc/context.go line 503 at r1 (raw file):

	// Authorizer provides a handle into the tenantcapabilities subsystem. It
	// allows KV nodes to perform capability checks for incoming tenant requests.

Let's call it TenantRPCAuthorizer. Authorizer is a bit too generic.


pkg/server/server.go line 1925 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I was following the pattern of creating a Watcher during NewServer and starting it here. node felt like the right place to put the Watcher, given it runs on KV nodes. Is there a better alternative than node?

Server (variable s in this scope)


pkg/ccl/multitenantccl/tenantcapabilitiesccl/testdata/can_admin_split line 25 at r1 (raw file):

SELECT pretty FROM [ALTER TABLE t SPLIT AT VALUES (0)]
----
pq: at or near "alter": syntax error

what's going on here - what's this syntax error about?


pkg/ccl/multitenantccl/tenantcapabilitiesccl/capabilities_test.go line 94 at r1 (raw file):

		// Create a tenant; we also want to allow test writers to issue
		// ALTER TABLE ... SPLIT statements, so configure the settings as such.

I think it would feel more natural to me if we saw the ALTER TENANT ALL SET CLUSTER SETTING ... = true inside the testdata with query-sql-system instead of as an override here.
Ditto for the others we'll be checking afterwards.


pkg/rpc/auth_test.go line 201 at r1 (raw file):

	defer leaktest.AfterTest(t)()
	tenID := roachpb.MustMakeTenantID(10)
	makeDisallowedAdminReq := func(key string) roachpb.Request {

I'm not going to object to you extracting makeSpan into a top level function, but then call it makeSpanShared then alias it here with

makeSpan := func(key) { return makeSpanShared(t, key) }

to remove the rest of the diff below. It makes the test harder to read/maintain.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @dhartunian, and @ecwall)


pkg/ccl/multitenantccl/tenantcapabilitiesccl/capabilities_test.go line 94 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I think it would feel more natural to me if we saw the ALTER TENANT ALL SET CLUSTER SETTING ... = true inside the testdata with query-sql-system instead of as an override here.
Ditto for the others we'll be checking afterwards.

Discussed separately: Please link to #96512 in a comment
When that issue is resolved, we add a new datadriven directive that delays the initialization of the secondary tenant until some system tenant SQL has completed executing.

@arulajmani arulajmani requested review from a team as code owners February 7, 2023 03:03
@arulajmani arulajmani requested review from msbutler and removed request for a team February 7, 2023 03:03
@cucaroach cucaroach removed their request for review February 8, 2023 16:45
@arulajmani arulajmani force-pushed the tencap-wire-watcher branch 2 times, most recently from 8d8251f to 948bfa7 Compare February 8, 2023 18:35
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=knz

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @ecwall, @knz, and @msbutler)


pkg/rpc/auth_tenant.go line 174 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Please spell out your thinking/vision about the organization of this code in the commit message.

Good call, added some words.

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Canceled.

@arulajmani
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Build failed (retrying...):

@irfansharif
Copy link
Contributor

We might be checking in something flakey here (build), or already did in a previous PR:

image

@irfansharif
Copy link
Contributor

bors r-

@craig
Copy link
Contributor

craig bot commented Feb 8, 2023

Canceled.

This patch wires up the tenant capabilities subsystem during server
startup. This includes both starting the subsystem and adding a handle
to the Authorizer so that it can be used by GRPC interceptors to perform
authorization checks.

While working through this patch, I realized we can't instantiate an
Authorizer with a handle to the tenant capability state
(tenantcapabilities.Reader). This is because the afformentioned GRPC
setup happens early on during the Server startup process, and at that
point we do not have access to the dependencies required to setup the
capabilities Watcher (which is what provides the Reader interface to
the Authorizer). To break this dependency cycle, we end up with an
approach to lazily bind the Reader to the Authorizer.

With the Authorizer wired up, we can now start using it to perform
capability checks for incoming tenant requests. Currently, this is
limited to batch requests. Note that the Authorizer is only responsible
for performing capability checks -- other authorization checks, such as
bounds checks, continue to happen outside of Authorizer.

This patch also adds a datadriven framework to test tenant capabilities
end to end. The nice thing about it is it hides the asynchronous nature
of capability checks from test writers. The hope is that we'll be able
to extend this as we add more capabilities.

Informs cockroachdb#94643

Release note: None
@arulajmani
Copy link
Collaborator Author

Wasn't a flake -- this skewed with #96659. Thanks for catching.

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 9, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants