Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tenantcapabilities: gate tenant access to node metadata and tsdb #96319

Conversation

dhartunian
Copy link
Collaborator

@dhartunian dhartunian commented Jan 31, 2023

Previously, tenants were given access via the kv connector, to node level
metrics and metadata. This ability should be gated behind a capability in order
to give operators control over what cluster-level information their application
tenants would have access to.

This commit adds authorization checks using tenant capabilities for the node
metadata query RPC and the TSDB query RPC.

The connection between the specific capability in the RPC it enables, is
encoded in the auth_tenant.go file within the tenantAuthorizer. The
capability Authorizer type simply provides per-capability check utility
methods.

The NodesUI endpoint contains an additional SQL permission gate, which is
honored by checking at the tenant-level, and then delegating (via capability
gate) to a system tenant level NodesTenant endpoint that does no additional
SQL gating. Delegating to a system tenant NodesUI implementation would fail
since the tenant does not have system-level SQL permissions. The liveness and
TSDB endpoints do no additional checking at time of writing, hence no changes
are made there.

Resolves #96975

Epic: CRDB-12100

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from becacfd to 18d10f6 Compare February 6, 2023 22:19
@dhartunian dhartunian changed the title [WIP] tenantcapabilities: gate tenant access to node metadata tenantcapabilities: gate tenant access to node metadata and tsdb Feb 6, 2023
@dhartunian dhartunian marked this pull request as ready for review February 6, 2023 22:19
@dhartunian dhartunian requested review from a team as code owners February 6, 2023 22:19
@dhartunian dhartunian requested a review from a team February 6, 2023 22:19
@dhartunian dhartunian requested a review from a team as a code owner February 6, 2023 22:19
@dhartunian dhartunian requested a review from a team February 6, 2023 22:19
@dhartunian dhartunian requested a review from a team as a code owner February 6, 2023 22:19
@dhartunian
Copy link
Collaborator Author

First commit is by @arulajmani and being reviewed in #96390.
Please only review commit #2.

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! It's great to see this coming together.

Only a small nit and clarifying question, but otherwise :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian)


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 126 at r2 (raw file):

func parseTenantCapability(t *testing.T, input string) tenantcapabilitiespb.TenantCapabilities {
	var cap = tenantcapabilitiespb.TenantCapabilities{}

nit: do we need the var x = ... syntax here?

Code quote:

var cap =

pkg/server/status.go line 1581 at r2 (raw file):

	if err != nil {
		if !grpcutil.IsAuthError(err) {
			return nil, err

I think I understand this piece, but just to check my own understanding...

I know the tenant connector validates the tenant capability in its own authorization step. Is that tenant capability a subset of the ViewClusterMetadataPermission we're checking for here, meaning that an auth error of the broader permission doesn't necessarily mean we don't have the specific tenant capability?

Code quote:

		if !grpcutil.IsAuthError(err) {
			return nil, err
		}

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 24 of 24 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 109 at r2 (raw file):

		)
	}
	if !cp.CanViewTsdbMetrics {

Q: is there a way to filter out timeseries queries to only allow to timeseries from the storage layer, but forbid access to timeseries for other tenants.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 110 at r2 (raw file):

	}
	if !cp.CanViewTsdbMetrics {
		return errors.Newf("tenant %s does not have capability to query timseries data", tenID)

nit: timseries -> timeseries


pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.proto line 37 at r2 (raw file):

  bool can_view_node_info = 2;

  // CanViewTSDBMetrics, if set to true,

can you add more words? And mention something like "does not give access to timeseries from other tenants" in there.


pkg/rpc/context.go line 427 at r2 (raw file):

	loopbackDialFn func(context.Context) (net.Conn, error)

	TenantCapabilitiesAuthorizer tenantcapabilities.Authorizer

Please add an explanatory comment.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 74 at r2 (raw file):

				}
				return err.Error()
			case "has-capability-for-tsdb-metrics":

nit: add an empty line above

@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from 18d10f6 to 64bea8c Compare February 7, 2023 22:44
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @knz)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 109 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Q: is there a way to filter out timeseries queries to only allow to timeseries from the storage layer, but forbid access to timeseries for other tenants.

Currently no, but the storage of per-tenant metrics doesn't exist yet either. @aadityasondhi is working on the code to tag metrics per-tenant that will allow us to serve tenant-scoped metrics.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 110 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: timseries -> timeseries

Done.


pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.proto line 37 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

can you add more words? And mention something like "does not give access to timeseries from other tenants" in there.

Adding message, but it will say the opposite if what you're requesting for now as discussed in earlier thread.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 126 at r2 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: do we need the var x = ... syntax here?

Done.


pkg/rpc/context.go line 427 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Please add an explanatory comment.

Thx for flagging. Removed. Unnecessary field given the work that Arul has done in prior commit.


pkg/server/status.go line 1581 at r2 (raw file):

Previously, abarganier (Alex Barganier) wrote…

I think I understand this piece, but just to check my own understanding...

I know the tenant connector validates the tenant capability in its own authorization step. Is that tenant capability a subset of the ViewClusterMetadataPermission we're checking for here, meaning that an auth error of the broader permission doesn't necessarily mean we don't have the specific tenant capability?

Short answer: SQL Perms and Tenant capabilities are completely disjoint and unrelated and must be checked independently.

The request has to pass through the tenant connector for app tenants, but does not for system tenants. In both cases we need to do a SQL permission check given the request gRPC context that comes from the HTTP cookie. This validates that the SQL user that's currently logged in has the permission to view Node data.

Once that's done on the system tenant (see implementation of (s *systemStatusServer) NodesUI(... below right under NodesTenant impl.) we just serve the data and we're done.

On the app tenant, it's a bit different since we have to go through the connector to serve the data from the underlying kv node (aka system tenant) so the connector checks the tenant capability before retrieving the data directly from KV (without an additional SQL permission check!) via NodesTenant.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 74 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: add an empty line above

Done.

@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch 3 times, most recently from 81d522d to d2e85c6 Compare February 13, 2023 15:50
@dhartunian dhartunian requested a review from a team as a code owner February 13, 2023 15:50
@dhartunian dhartunian requested review from dt and removed request for a team February 13, 2023 15:50
@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from d2e85c6 to 5fb725c Compare February 14, 2023 23:32
@dhartunian
Copy link
Collaborator Author

@arulajmani can you take a quick look to see if I screwed up the intent of your tests?

@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from 5fb725c to 51fc43b Compare February 15, 2023 16:22
@dhartunian dhartunian requested a review from a team February 15, 2023 16:22
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, my comments are pretty minor.

Reviewed 11 of 57 files at r3, 1 of 14 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, @dt, and @knz)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go line 41 at r4 (raw file):

}

func (n *NoopAuthorizer) HasNodeStatusCapability(

Here, and below, missing comment.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 97 at r4 (raw file):

// tenantcapabilities.Update, allowing data-driven tests to assert on the
// output.
func PrintTenantCapabilityUpdate(update tenantcapabilities.Update) string {

How do you feel about keeping some form of this function around? Disambiguating between an update and a delete is nice when reading the test files.

If this printing feels extra and cumbersome to work with as new capabilities are added, maybe we could get rid of just PrintTenantCapability instead?


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 34 at r4 (raw file):

// Example:
//
//	update-state

nit: should we update this as well with the changes we're making?


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 65 at r4 (raw file):

				update := tenantcapabilitiestestutils.ParseTenantCapabilityDelete(t, d)
				mockReader.updateState([]*tenantcapabilities.Update{update})
			case "has-capability":

nit: instead of this "cap" argument, could we instead have add has-capability-for-batch, has-capability-for-node-status... etc.? That way, the test file maps nicely to the interface.

Separately, if we do this, we might not need the cmds argument?


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 70 at r4 (raw file):

				d.ScanArgs(t, "cap", &cap)
				switch cap {
				case "can_admin_split":

(Superseded by the comment above), should this be "for_batch" instead? I don't think this matters now, but in the future we're going to introduce capabilities for other batch requests (eg. AdminRelocateRange etc.).

@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from 51fc43b to e360cdd Compare February 15, 2023 22:06
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arulajmani Thx for the helpful comments. Tests are updated.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @arulajmani, @dt, and @knz)


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go line 41 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Here, and below, missing comment.

Done.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 97 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

How do you feel about keeping some form of this function around? Disambiguating between an update and a delete is nice when reading the test files.

If this printing feels extra and cumbersome to work with as new capabilities are added, maybe we could get rid of just PrintTenantCapability instead?

Done. Implemented Stringer on Update and Entry structs that does the prefix thing.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 34 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: should we update this as well with the changes we're making?

Done.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 65 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: instead of this "cap" argument, could we instead have add has-capability-for-batch, has-capability-for-node-status... etc.? That way, the test file maps nicely to the interface.

Separately, if we do this, we might not need the cmds argument?

agreed. done.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 70 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

(Superseded by the comment above), should this be "for_batch" instead? I don't think this matters now, but in the future we're going to introduce capabilities for other batch requests (eg. AdminRelocateRange etc.).

Ah I see the authorizer doesn't expose the granular capability just an interface that operates on the Batch as a whole. Renamed the cmd.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything around the capabilities package and tests :lgtm:

Reviewed 1 of 24 files at r2, 1 of 57 files at r3, 1 of 14 files at r4, 7 of 10 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, @dt, and @knz)


pkg/multitenant/tenantcapabilities/capabilities.go line 87 at r5 (raw file):

func (u Update) String() string {
	if u.Deleted {
		return fmt.Sprintf("delete: %+v", u.Entry)

nit: for deleted updates, can we just print the tenant ID instead of the entire entry? I should've written a comment about this, but the capability isn't meaningful when an update corresponds to a delete.


pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 97 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Done. Implemented Stringer on Update and Entry structs that does the prefix thing.

I like it! 💯


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 46 at r5 (raw file):

//
//	has-capability-for-batch ten=10 cmds=(split)
//	----

nit: should we add the has-tsdb-query-capability and has-node-status-capability here as well?

Previously, tenants were given access via the kv connector, to node level
metrics and metadata. This ability should be gated behind a capability in order
to give operators control over what cluster-level information their application
tenants would have access to.

This commit adds authorization checks using tenant capabilities for the node
metadata query RPC and the TSDB query RPC.

The connection between the specific capability in the RPC it enables, is
encoded in the auth_tenant.go file within the `tenantAuthorizer`. The
capability `Authorizer` type simply provides per-capability check utility
methods.

The `NodesUI` endpoint contains an additional SQL permission gate, which is
honored by checking at the tenant-level, and then delegating (via capability
gate) to a system tenant level `NodesTenant` endpoint that does no additional
SQL gating. Delegating to a system tenant `NodesUI` implementation would fail
since the tenant does not have system-level SQL permissions. The liveness and
TSDB endpoints do no additional checking at time of writing, hence no changes
are made there.

This commit additionally modifies the format of the datadriven tests in the
`tenantcapabilitiesauthorizer` and `tenantcapabilitieswatcher` packages to
conform to the standard datadriven command style instead of implementing custom
parsers.

Resolves cockroachdb#96975

Epic: CRDB-12100

Release note: None
@dhartunian dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from e360cdd to a8068bb Compare February 16, 2023 15:41
@dhartunian
Copy link
Collaborator Author

TFTRs. Last nits done.

bors r=arulajmani,knz,abarganier

@craig
Copy link
Contributor

craig bot commented Feb 16, 2023

Build succeeded:

@craig craig bot merged commit 09843a7 into cockroachdb:master Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: add tenant capability gating to Node and TSDB queries for DB Console
5 participants