-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tenantcapabilities: gate tenant access to node metadata and tsdb #96319
tenantcapabilities: gate tenant access to node metadata and tsdb #96319
Conversation
becacfd
to
18d10f6
Compare
First commit is by @arulajmani and being reviewed in #96390. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! It's great to see this coming together.
Only a small nit and clarifying question, but otherwise
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian)
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 126 at r2 (raw file):
func parseTenantCapability(t *testing.T, input string) tenantcapabilitiespb.TenantCapabilities { var cap = tenantcapabilitiespb.TenantCapabilities{}
nit: do we need the var x = ...
syntax here?
Code quote:
var cap =
pkg/server/status.go
line 1581 at r2 (raw file):
if err != nil { if !grpcutil.IsAuthError(err) { return nil, err
I think I understand this piece, but just to check my own understanding...
I know the tenant connector validates the tenant capability in its own authorization step. Is that tenant capability a subset of the ViewClusterMetadataPermission
we're checking for here, meaning that an auth error of the broader permission doesn't necessarily mean we don't have the specific tenant capability?
Code quote:
if !grpcutil.IsAuthError(err) {
return nil, err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 24 of 24 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 109 at r2 (raw file):
) } if !cp.CanViewTsdbMetrics {
Q: is there a way to filter out timeseries queries to only allow to timeseries from the storage layer, but forbid access to timeseries for other tenants.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 110 at r2 (raw file):
} if !cp.CanViewTsdbMetrics { return errors.Newf("tenant %s does not have capability to query timseries data", tenID)
nit: timseries -> timeseries
pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.proto
line 37 at r2 (raw file):
bool can_view_node_info = 2; // CanViewTSDBMetrics, if set to true,
can you add more words? And mention something like "does not give access to timeseries from other tenants" in there.
pkg/rpc/context.go
line 427 at r2 (raw file):
loopbackDialFn func(context.Context) (net.Conn, error) TenantCapabilitiesAuthorizer tenantcapabilities.Authorizer
Please add an explanatory comment.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 74 at r2 (raw file):
} return err.Error() case "has-capability-for-tsdb-metrics":
nit: add an empty line above
18d10f6
to
64bea8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, and @knz)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 109 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Q: is there a way to filter out timeseries queries to only allow to timeseries from the storage layer, but forbid access to timeseries for other tenants.
Currently no, but the storage of per-tenant metrics doesn't exist yet either. @aadityasondhi is working on the code to tag metrics per-tenant that will allow us to serve tenant-scoped metrics.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 110 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: timseries -> timeseries
Done.
pkg/multitenant/tenantcapabilities/tenantcapabilitiespb/capabilities.proto
line 37 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
can you add more words? And mention something like "does not give access to timeseries from other tenants" in there.
Adding message, but it will say the opposite if what you're requesting for now as discussed in earlier thread.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 126 at r2 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: do we need the
var x = ...
syntax here?
Done.
pkg/rpc/context.go
line 427 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Please add an explanatory comment.
Thx for flagging. Removed. Unnecessary field given the work that Arul has done in prior commit.
pkg/server/status.go
line 1581 at r2 (raw file):
Previously, abarganier (Alex Barganier) wrote…
I think I understand this piece, but just to check my own understanding...
I know the tenant connector validates the tenant capability in its own authorization step. Is that tenant capability a subset of the
ViewClusterMetadataPermission
we're checking for here, meaning that an auth error of the broader permission doesn't necessarily mean we don't have the specific tenant capability?
Short answer: SQL Perms and Tenant capabilities are completely disjoint and unrelated and must be checked independently.
The request has to pass through the tenant connector for app tenants, but does not for system tenants. In both cases we need to do a SQL permission check given the request gRPC context that comes from the HTTP cookie. This validates that the SQL user that's currently logged in has the permission to view Node data.
Once that's done on the system tenant (see implementation of (s *systemStatusServer) NodesUI(...
below right under NodesTenant
impl.) we just serve the data and we're done.
On the app tenant, it's a bit different since we have to go through the connector to serve the data from the underlying kv node (aka system tenant) so the connector checks the tenant capability before retrieving the data directly from KV (without an additional SQL permission check!) via NodesTenant
.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 74 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: add an empty line above
Done.
81d522d
to
d2e85c6
Compare
d2e85c6
to
5fb725c
Compare
@arulajmani can you take a quick look to see if I screwed up the intent of your tests? |
5fb725c
to
51fc43b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, my comments are pretty minor.
Reviewed 11 of 57 files at r3, 1 of 14 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, @dt, and @knz)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go
line 41 at r4 (raw file):
} func (n *NoopAuthorizer) HasNodeStatusCapability(
Here, and below, missing comment.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 97 at r4 (raw file):
// tenantcapabilities.Update, allowing data-driven tests to assert on the // output. func PrintTenantCapabilityUpdate(update tenantcapabilities.Update) string {
How do you feel about keeping some form of this function around? Disambiguating between an update and a delete is nice when reading the test files.
If this printing feels extra and cumbersome to work with as new capabilities are added, maybe we could get rid of just PrintTenantCapability
instead?
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 34 at r4 (raw file):
// Example: // // update-state
nit: should we update this as well with the changes we're making?
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 65 at r4 (raw file):
update := tenantcapabilitiestestutils.ParseTenantCapabilityDelete(t, d) mockReader.updateState([]*tenantcapabilities.Update{update}) case "has-capability":
nit: instead of this "cap" argument, could we instead have add has-capability-for-batch
, has-capability-for-node-status
... etc.? That way, the test file maps nicely to the interface.
Separately, if we do this, we might not need the cmds
argument?
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 70 at r4 (raw file):
d.ScanArgs(t, "cap", &cap) switch cap { case "can_admin_split":
(Superseded by the comment above), should this be "for_batch" instead? I don't think this matters now, but in the future we're going to introduce capabilities for other batch requests (eg. AdminRelocateRange etc.).
51fc43b
to
e360cdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arulajmani Thx for the helpful comments. Tests are updated.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @arulajmani, @dt, and @knz)
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/noop.go
line 41 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Here, and below, missing comment.
Done.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 97 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
How do you feel about keeping some form of this function around? Disambiguating between an update and a delete is nice when reading the test files.
If this printing feels extra and cumbersome to work with as new capabilities are added, maybe we could get rid of just
PrintTenantCapability
instead?
Done. Implemented Stringer on Update
and Entry
structs that does the prefix thing.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 34 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: should we update this as well with the changes we're making?
Done.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 65 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: instead of this "cap" argument, could we instead have add
has-capability-for-batch
,has-capability-for-node-status
... etc.? That way, the test file maps nicely to the interface.Separately, if we do this, we might not need the
cmds
argument?
agreed. done.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 70 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
(Superseded by the comment above), should this be "for_batch" instead? I don't think this matters now, but in the future we're going to introduce capabilities for other batch requests (eg. AdminRelocateRange etc.).
Ah I see the authorizer doesn't expose the granular capability just an interface that operates on the Batch as a whole. Renamed the cmd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything around the capabilities package and tests
Reviewed 1 of 24 files at r2, 1 of 57 files at r3, 1 of 14 files at r4, 7 of 10 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @abarganier, @dhartunian, @dt, and @knz)
pkg/multitenant/tenantcapabilities/capabilities.go
line 87 at r5 (raw file):
func (u Update) String() string { if u.Deleted { return fmt.Sprintf("delete: %+v", u.Entry)
nit: for deleted updates, can we just print the tenant ID instead of the entire entry? I should've written a comment about this, but the capability isn't meaningful when an update corresponds to a delete.
pkg/multitenant/tenantcapabilities/tenantcapabilitiestestutils/testutils.go
line 97 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Done. Implemented Stringer on
Update
andEntry
structs that does the prefix thing.
I like it! 💯
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer_test.go
line 46 at r5 (raw file):
// // has-capability-for-batch ten=10 cmds=(split) // ----
nit: should we add the has-tsdb-query-capability
and has-node-status-capability
here as well?
Previously, tenants were given access via the kv connector, to node level metrics and metadata. This ability should be gated behind a capability in order to give operators control over what cluster-level information their application tenants would have access to. This commit adds authorization checks using tenant capabilities for the node metadata query RPC and the TSDB query RPC. The connection between the specific capability in the RPC it enables, is encoded in the auth_tenant.go file within the `tenantAuthorizer`. The capability `Authorizer` type simply provides per-capability check utility methods. The `NodesUI` endpoint contains an additional SQL permission gate, which is honored by checking at the tenant-level, and then delegating (via capability gate) to a system tenant level `NodesTenant` endpoint that does no additional SQL gating. Delegating to a system tenant `NodesUI` implementation would fail since the tenant does not have system-level SQL permissions. The liveness and TSDB endpoints do no additional checking at time of writing, hence no changes are made there. This commit additionally modifies the format of the datadriven tests in the `tenantcapabilitiesauthorizer` and `tenantcapabilitieswatcher` packages to conform to the standard datadriven command style instead of implementing custom parsers. Resolves cockroachdb#96975 Epic: CRDB-12100 Release note: None
e360cdd
to
a8068bb
Compare
TFTRs. Last nits done. bors r=arulajmani,knz,abarganier |
Build succeeded: |
Previously, tenants were given access via the kv connector, to node level
metrics and metadata. This ability should be gated behind a capability in order
to give operators control over what cluster-level information their application
tenants would have access to.
This commit adds authorization checks using tenant capabilities for the node
metadata query RPC and the TSDB query RPC.
The connection between the specific capability in the RPC it enables, is
encoded in the auth_tenant.go file within the
tenantAuthorizer
. Thecapability
Authorizer
type simply provides per-capability check utilitymethods.
The
NodesUI
endpoint contains an additional SQL permission gate, which ishonored by checking at the tenant-level, and then delegating (via capability
gate) to a system tenant level
NodesTenant
endpoint that does no additionalSQL gating. Delegating to a system tenant
NodesUI
implementation would failsince the tenant does not have system-level SQL permissions. The liveness and
TSDB endpoints do no additional checking at time of writing, hence no changes
are made there.
Resolves #96975
Epic: CRDB-12100
Release note: None