tenantcapabilities: gate tenant access to node metadata and tsdb #96319

dhartunian · 2023-01-31T23:35:51Z

Previously, tenants were given access via the kv connector, to node level
metrics and metadata. This ability should be gated behind a capability in order
to give operators control over what cluster-level information their application
tenants would have access to.

This commit adds authorization checks using tenant capabilities for the node
metadata query RPC and the TSDB query RPC.

The connection between the specific capability in the RPC it enables, is
encoded in the auth_tenant.go file within the tenantAuthorizer. The
capability Authorizer type simply provides per-capability check utility
methods.

The NodesUI endpoint contains an additional SQL permission gate, which is
honored by checking at the tenant-level, and then delegating (via capability
gate) to a system tenant level NodesTenant endpoint that does no additional
SQL gating. Delegating to a system tenant NodesUI implementation would fail
since the tenant does not have system-level SQL permissions. The liveness and
TSDB endpoints do no additional checking at time of writing, hence no changes
are made there.

Resolves #96975

Epic: CRDB-12100

Release note: None

cockroach-teamcity · 2023-01-31T23:35:59Z

This change is

dhartunian · 2023-02-06T22:20:11Z

First commit is by @arulajmani and being reviewed in #96390.
Please only review commit #2.

abarganier

Nice work! It's great to see this coming together.

Only a small nit and clarifying question, but otherwise

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian)

pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 126 at r2 (raw file):

func parseTenantCapability(t *testing.T, input string) tenantcapabilitiespb.TenantCapabilities {
	var cap = tenantcapabilitiespb.TenantCapabilities{}

nit: do we need the var x = ... syntax here?

Code quote:

var cap =

pkg/server/status.go line 1581 at r2 (raw file):

	if err != nil {
		if !grpcutil.IsAuthError(err) {
			return nil, err

I think I understand this piece, but just to check my own understanding...

I know the tenant connector validates the tenant capability in its own authorization step. Is that tenant capability a subset of the ViewClusterMetadataPermission we're checking for here, meaning that an auth error of the broader permission doesn't necessarily mean we don't have the specific tenant capability?

Code quote:

		if !grpcutil.IsAuthError(err) {
			return nil, err
		}

knz

Reviewed 24 of 24 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 109 at r2 (raw file):

		)
	}
	if !cp.CanViewTsdbMetrics {

Q: is there a way to filter out timeseries queries to only allow to timeseries from the storage layer, but forbid access to timeseries for other tenants.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 110 at r2 (raw file):

	}
	if !cp.CanViewTsdbMetrics {
		return errors.Newf("tenant %s does not have capability to query timseries data", tenID)

nit: timseries -> timeseries

pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.proto line 37 at r2 (raw file):

  bool can_view_node_info = 2;

  // CanViewTSDBMetrics, if set to true,

can you add more words? And mention something like "does not give access to timeseries from other tenants" in there.

pkg/rpc/context.go line 427 at r2 (raw file):

	loopbackDialFn func(context.Context) (net.Conn, error)

	TenantCapabilitiesAuthorizer tenantcapabilities.Authorizer

Please add an explanatory comment.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 74 at r2 (raw file):

				}
				return err.Error()
			case "has-capability-for-tsdb-metrics":

nit: add an empty line above

dhartunian

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @knz)

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 109 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Q: is there a way to filter out timeseries queries to only allow to timeseries from the storage layer, but forbid access to timeseries for other tenants.

Currently no, but the storage of per-tenant metrics doesn't exist yet either. @aadityasondhi is working on the code to tag metrics per-tenant that will allow us to serve tenant-scoped metrics.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 110 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: timseries -> timeseries

Done.

pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.proto line 37 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

can you add more words? And mention something like "does not give access to timeseries from other tenants" in there.

Adding message, but it will say the opposite if what you're requesting for now as discussed in earlier thread.

pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 126 at r2 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: do we need the var x = ... syntax here?

Done.

pkg/rpc/context.go line 427 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Please add an explanatory comment.

Thx for flagging. Removed. Unnecessary field given the work that Arul has done in prior commit.

pkg/server/status.go line 1581 at r2 (raw file):

Previously, abarganier (Alex Barganier) wrote…

I think I understand this piece, but just to check my own understanding...

I know the tenant connector validates the tenant capability in its own authorization step. Is that tenant capability a subset of the ViewClusterMetadataPermission we're checking for here, meaning that an auth error of the broader permission doesn't necessarily mean we don't have the specific tenant capability?

Short answer: SQL Perms and Tenant capabilities are completely disjoint and unrelated and must be checked independently.

The request has to pass through the tenant connector for app tenants, but does not for system tenants. In both cases we need to do a SQL permission check given the request gRPC context that comes from the HTTP cookie. This validates that the SQL user that's currently logged in has the permission to view Node data.

Once that's done on the system tenant (see implementation of (s *systemStatusServer) NodesUI(... below right under NodesTenant impl.) we just serve the data and we're done.

On the app tenant, it's a bit different since we have to go through the connector to serve the data from the underlying kv node (aka system tenant) so the connector checks the tenant capability before retrieving the data directly from KV (without an additional SQL permission check!) via NodesTenant.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 74 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: add an empty line above

Done.

dhartunian · 2023-02-15T15:48:38Z

@arulajmani can you take a quick look to see if I screwed up the intent of your tests?

arulajmani

Looks good overall, my comments are pretty minor.

Reviewed 11 of 57 files at r3, 1 of 14 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, @dt, and @knz)

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go line 41 at r4 (raw file):

}

func (n *NoopAuthorizer) HasNodeStatusCapability(

Here, and below, missing comment.

pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 97 at r4 (raw file):

// tenantcapabilities.Update, allowing data-driven tests to assert on the
// output.
func PrintTenantCapabilityUpdate(update tenantcapabilities.Update) string {

How do you feel about keeping some form of this function around? Disambiguating between an update and a delete is nice when reading the test files.

If this printing feels extra and cumbersome to work with as new capabilities are added, maybe we could get rid of just PrintTenantCapability instead?

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 34 at r4 (raw file):

// Example:
//
//	update-state

nit: should we update this as well with the changes we're making?

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 65 at r4 (raw file):

				update := tenantcapabilitiestestutils.ParseTenantCapabilityDelete(t, d)
				mockReader.updateState([]*tenantcapabilities.Update{update})
			case "has-capability":

nit: instead of this "cap" argument, could we instead have add has-capability-for-batch, has-capability-for-node-status... etc.? That way, the test file maps nicely to the interface.

Separately, if we do this, we might not need the cmds argument?

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 70 at r4 (raw file):

				d.ScanArgs(t, "cap", &cap)
				switch cap {
				case "can_admin_split":

(Superseded by the comment above), should this be "for_batch" instead? I don't think this matters now, but in the future we're going to introduce capabilities for other batch requests (eg. AdminRelocateRange etc.).

dhartunian

@arulajmani Thx for the helpful comments. Tests are updated.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @arulajmani, @dt, and @knz)

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go line 41 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Here, and below, missing comment.

Done.

pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 97 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

How do you feel about keeping some form of this function around? Disambiguating between an update and a delete is nice when reading the test files.

If this printing feels extra and cumbersome to work with as new capabilities are added, maybe we could get rid of just PrintTenantCapability instead?

Done. Implemented Stringer on Update and Entry structs that does the prefix thing.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 34 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: should we update this as well with the changes we're making?

Done.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 65 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: instead of this "cap" argument, could we instead have add has-capability-for-batch, has-capability-for-node-status... etc.? That way, the test file maps nicely to the interface.

Separately, if we do this, we might not need the cmds argument?

agreed. done.

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 70 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

(Superseded by the comment above), should this be "for_batch" instead? I don't think this matters now, but in the future we're going to introduce capabilities for other batch requests (eg. AdminRelocateRange etc.).

Ah I see the authorizer doesn't expose the granular capability just an interface that operates on the Batch as a whole. Renamed the cmd.

arulajmani

Everything around the capabilities package and tests

Reviewed 1 of 24 files at r2, 1 of 57 files at r3, 1 of 14 files at r4, 7 of 10 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, @dt, and @knz)

pkg/multitenant/tenantcapabilities/capabilities.go line 87 at r5 (raw file):

func (u Update) String() string {
	if u.Deleted {
		return fmt.Sprintf("delete: %+v", u.Entry)

nit: for deleted updates, can we just print the tenant ID instead of the entire entry? I should've written a comment about this, but the capability isn't meaningful when an update corresponds to a delete.

pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go line 97 at r4 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Done. Implemented Stringer on Update and Entry structs that does the prefix thing.

I like it! 💯

pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go line 46 at r5 (raw file):

//
//	has-capability-for-batch ten=10 cmds=(split)
//	----

nit: should we add the has-tsdb-query-capability and has-node-status-capability here as well?

Previously, tenants were given access via the kv connector, to node level metrics and metadata. This ability should be gated behind a capability in order to give operators control over what cluster-level information their application tenants would have access to. This commit adds authorization checks using tenant capabilities for the node metadata query RPC and the TSDB query RPC. The connection between the specific capability in the RPC it enables, is encoded in the auth_tenant.go file within the `tenantAuthorizer`. The capability `Authorizer` type simply provides per-capability check utility methods. The `NodesUI` endpoint contains an additional SQL permission gate, which is honored by checking at the tenant-level, and then delegating (via capability gate) to a system tenant level `NodesTenant` endpoint that does no additional SQL gating. Delegating to a system tenant `NodesUI` implementation would fail since the tenant does not have system-level SQL permissions. The liveness and TSDB endpoints do no additional checking at time of writing, hence no changes are made there. This commit additionally modifies the format of the datadriven tests in the `tenantcapabilitiesauthorizer` and `tenantcapabilitieswatcher` packages to conform to the standard datadriven command style instead of implementing custom parsers. Resolves cockroachdb#96975 Epic: CRDB-12100 Release note: None

dhartunian · 2023-02-16T19:35:14Z

TFTRs. Last nits done.

bors r=arulajmani,knz,abarganier

craig · 2023-02-16T21:31:57Z

Build succeeded:

Bazel Essential CI (Cockroach)

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from becacfd to 18d10f6 Compare February 6, 2023 22:19

dhartunian changed the title ~~[WIP] tenantcapabilities: gate tenant access to node metadata~~ tenantcapabilities: gate tenant access to node metadata and tsdb Feb 6, 2023

dhartunian marked this pull request as ready for review February 6, 2023 22:19

dhartunian requested review from a team as code owners February 6, 2023 22:19

dhartunian requested a review from a team February 6, 2023 22:19

dhartunian requested a review from a team as a code owner February 6, 2023 22:19

dhartunian requested a review from a team February 6, 2023 22:19

dhartunian requested a review from a team as a code owner February 6, 2023 22:19

abarganier approved these changes Feb 7, 2023

View reviewed changes

knz reviewed Feb 7, 2023

View reviewed changes

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from 18d10f6 to 64bea8c Compare February 7, 2023 22:44

dhartunian commented Feb 7, 2023

View reviewed changes

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch 3 times, most recently from 81d522d to d2e85c6 Compare February 13, 2023 15:50

dhartunian requested a review from a team as a code owner February 13, 2023 15:50

dhartunian requested review from dt and removed request for a team February 13, 2023 15:50

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from d2e85c6 to 5fb725c Compare February 14, 2023 23:32

dhartunian requested a review from arulajmani February 15, 2023 15:48

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from 5fb725c to 51fc43b Compare February 15, 2023 16:22

dhartunian requested a review from a team February 15, 2023 16:22

arulajmani reviewed Feb 15, 2023

View reviewed changes

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from 51fc43b to e360cdd Compare February 15, 2023 22:06

dhartunian commented Feb 15, 2023

View reviewed changes

arulajmani approved these changes Feb 16, 2023

View reviewed changes

dhartunian force-pushed the use-tenant-capabilities-for-tenant-db-console branch from e360cdd to a8068bb Compare February 16, 2023 15:41

craig bot merged commit 09843a7 into cockroachdb:master Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tenantcapabilities: gate tenant access to node metadata and tsdb #96319

tenantcapabilities: gate tenant access to node metadata and tsdb #96319

dhartunian commented Jan 31, 2023 •

edited

Loading

cockroach-teamcity commented Jan 31, 2023

dhartunian commented Feb 6, 2023

abarganier left a comment

knz left a comment

dhartunian left a comment

dhartunian commented Feb 15, 2023

arulajmani left a comment

dhartunian left a comment

arulajmani left a comment

dhartunian commented Feb 16, 2023

craig bot commented Feb 16, 2023

tenantcapabilities: gate tenant access to node metadata and tsdb #96319

tenantcapabilities: gate tenant access to node metadata and tsdb #96319

Conversation

dhartunian commented Jan 31, 2023 • edited Loading

cockroach-teamcity commented Jan 31, 2023

dhartunian commented Feb 6, 2023

abarganier left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

dhartunian left a comment

Choose a reason for hiding this comment

dhartunian commented Feb 15, 2023

arulajmani left a comment

Choose a reason for hiding this comment

dhartunian left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

dhartunian commented Feb 16, 2023

craig bot commented Feb 16, 2023

dhartunian commented Jan 31, 2023 •

edited

Loading