Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvtenant: implement SystemConfigProvider, remove Gossip from SQL-only server #52034

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #49445.

This commit introduces a new SystemConfigProvider abstraction, which is capable of providing the SystemConfig, as well as notifying clients of updates to the SystemConfig. Gossip already implements this interface. The commit then updates SQL to use this new dependency in place of Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to implement the new SystemConfigProvider interface. This is powered by the GossipSubscription RPC that was added in #50520. The commit updates the subscription to also match on the "system-db" gossip key, and just like that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a SystemConfigProvider to SQL when applicable, we're able to remove gossip entirely from StartTenant. SQL-only servers will no longer join the gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to enforce an access control policy on the system config gossip pattern. See the updated comment in Node.GossipSubscription. For now, we're just returning the entire SystemConfig to the subscription.

@nvanbenschoten nvanbenschoten requested a review from tbg July 28, 2020 20:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantSysCfgGossip branch from bc879ac to 34993f8 Compare July 28, 2020 22:32
@nvanbenschoten nvanbenschoten requested a review from a team July 28, 2020 22:32
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner July 28, 2020 22:32
@nvanbenschoten nvanbenschoten requested review from dt and removed request for a team and dt July 28, 2020 22:32
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantSysCfgGossip branch from 34993f8 to 0806f49 Compare July 29, 2020 11:51
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jul 29, 2020
Fixes cockroachdb#47898.

Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits.

This commit adds a collection of access control policies for the newly
exposed tenant RPC server. These authorization policies ensure that an
authenticated tenant is only able to access keys within its keyspace and
that no tenant is able to access data from another tenant's keyspace
through the tenant RPC server. This is a major step in providing
crypto-backed logical isolation between tenants in a multi-tenant
cluster.

The existing auth mechanism is retained on the standard RPC server,
which means that the system tenant is still able to access any key in
the system.
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. As far as I'm concerned, this is ready to go when the fixups I will be pushing shortly are squashed.

Reviewed 1 of 1 files at r1, 15 of 15 files at r2, 21 of 21 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/ccl/kvccl/kvtenantccl/proxy.go, line 242 at r2 (raw file):

// GetSystemConfig implements the config.SystemConfigProvider interface.
func (p *Proxy) GetSystemConfig() *config.SystemConfig {
	// TODO DURING REVIEW: how does this work with Gossip when the SystemConfig

Yes, I think the current code plays fast and loose here. We typically have Gossip connected by the time SQL needs a system config and when gossip connects, I suppose the system config is among the first things that are immediately received.

In the Connector (i.e. tenant.Proxy), I think we want to wait here:

until we've populated the system config. Since we call .Start() first thing when starting the SQL server, that effectively means that the system config is always available. Even in this code now, though, we're roughly in the same place as before: we're not guaranteed to have the zone config, but since we can't reach this code before a subscription has received data, it's quite likely that we do have it already.

I tried something new - I made a fixup commit for you and am pushing it to your branch past the review. Let me know if that's something we should do more often.

PS: gh makes this kind of thing quite easy.


pkg/ccl/kvccl/kvtenantccl/proxy.go, line 255 at r2 (raw file):

func (p *Proxy) RegisterSystemConfigChannel() <-chan struct{} {
	// Create channel that receives new system config notifications.
	// The channel has a size of 1 to prevent proxy from blocking on it.

nit: "having to block on it" is more precise, right? We never blocking-send into this thing.

Added fixup commit as well.

@tbg tbg force-pushed the nvanbenschoten/tenantSysCfgGossip branch 2 times, most recently from 4159d2c to 93b694f Compare July 31, 2020 11:56
@tbg
Copy link
Member

tbg commented Jul 31, 2020

I actually went back and squashed the fixups myself. That way you can get a green CI when you open this hopefully. Again, let me know if you think this is a good way to go about it, I have a feeling it can avoid time typing things into review and copying ideas from reviewable over to code.

Only the entries are used, so it's confusing to be manipulating a full
*SystemConfig.

This also saves an allocation per update, though that's hardly
important.
… server

Fixes cockroachdb#49445.

This commit introduces a new SystemConfigProvider abstraction, which is
capable of providing the SystemConfig, as well as notifying clients of
updates to the SystemConfig. Gossip already implements this interface.
The commit then updates SQL to use this new dependency in place of
Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to
implement the new SystemConfigProvider interface. This is powered by the
GossipSubscription RPC that was added in cockroachdb#50520. The commit updates the
subscription to also match on the "system-db" gossip key, and just like
that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a
SystemConfigProvider to SQL when applicable, we're able to remove gossip
entirely from StartTenant. SQL-only servers will no longer join the
gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to
enforce an access control policy on the system config gossip pattern.
See the updated comment in `Node.GossipSubscription`. For now, we're
just returning the entire SystemConfig to the subscription.
Addresses a TODO and simplifies code.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantSysCfgGossip branch from 93b694f to 6c1a90b Compare August 3, 2020 04:45
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/ccl/kvccl/kvtenantccl/proxy.go, line 242 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Yes, I think the current code plays fast and loose here. We typically have Gossip connected by the time SQL needs a system config and when gossip connects, I suppose the system config is among the first things that are immediately received.

In the Connector (i.e. tenant.Proxy), I think we want to wait here:

until we've populated the system config. Since we call .Start() first thing when starting the SQL server, that effectively means that the system config is always available. Even in this code now, though, we're roughly in the same place as before: we're not guaranteed to have the zone config, but since we can't reach this code before a subscription has received data, it's quite likely that we do have it already.

I tried something new - I made a fixup commit for you and am pushing it to your branch past the review. Let me know if that's something we should do more often.

PS: gh makes this kind of thing quite easy.

I'm mostly fine with this approach, though it does make things confusing on the PR author's side because now my remote has a newer revision than my local branch and I need to force pull it down. In this case, that's fine, but I often have PRs where I allow my remote branch to lag behind my local branch (mainly due to origin/master rebases), so this kind of thing would cause the two to completely diverge, which would end up creating more work than the alternative. Maybe gh makes this all more seamless, but I don't know if it's really worth the hassle.

It also makes it a bit awkward to determine what actually changed. I'd say if we're going to do this, we should keep the fixups in a separate commit and let the PR author squash them in when they want to. I don't mind the extra CI round on large PRs like this.


pkg/ccl/kvccl/kvtenantccl/proxy.go, line 255 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: "having to block on it" is more precise, right? We never blocking-send into this thing.

Added fixup commit as well.

👍 Made the same change in Gossip.RegisterSystemConfigChannel.

@craig
Copy link
Contributor

craig bot commented Aug 3, 2020

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 3, 2020

Build failed:

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 3, 2020

Build succeeded:

@craig craig bot merged commit 470510e into cockroachdb:master Aug 3, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/tenantSysCfgGossip branch August 3, 2020 15:13
@lunevalex lunevalex added the A-multitenancy Related to multi-tenancy label Aug 4, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 4, 2020
Fixes cockroachdb#47898.

Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits.

This commit adds a collection of access control policies for the newly
exposed tenant RPC server. These authorization policies ensure that an
authenticated tenant is only able to access keys within its keyspace and
that no tenant is able to access data from another tenant's keyspace
through the tenant RPC server. This is a major step in providing
crypto-backed logical isolation between tenants in a multi-tenant
cluster.

The existing auth mechanism is retained on the standard RPC server,
which means that the system tenant is still able to access any key in
the system.
craig bot pushed a commit that referenced this pull request Aug 5, 2020
51803: cmd/docgen: add HTTP extractor r=mjibson a=mjibson

Add a way to extract docs from the status.proto HTTP endpoint. These
can be imported into the docs project as needed.

Release note: None

52083: roachtest: small misc r=andreimatei a=andreimatei

See individual commits.

52094: rpc: implement tenant access control policies at KV RPC boundary r=nvanbenschoten a=nvanbenschoten

Fixes #47898.

Rebased on #51503 and #52034. Ignore all but the last 3 commits.

This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster.

The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system.

52352: sql/pgwire: add regression test for varchar OIDs in RowDescription r=jordanlewis a=rafiss

See issue #51360. The bug described in it was fixed somewhat
accidentally, so this test will verify that we don't regress again.

Release note: None

52386: opt: add SerializingProject exec primitive r=RaduBerinde a=RaduBerinde

The top-level projection of a query has a special property - it can project away
columns that we want an ordering on (e.g. `SELECT a FROM t ORDER BY b`).

The distsql physical planner was designed to tolerate such cases, as they were
much more common with the heuristic planner. But the new distsql exec factory
does not; it currently relies on a hack: it detects this case by checking if the
required output ordering is `nil`. This is fragile and doesn't work in all
cases.

This change adds a `SerializingProject` primitive which is like a SimpleProject
but it forces serialization of all parallel streams into one. The new primitive
is used to enforce the final query presentation. We only need to pass column
names for the presentation, so we remove `RenameColumns` and remove the column
names argument from `SimpleProject` (simplifying some execbuilder code).

We also fix a bug in `ConstructSimpleProject` where we weren't taking the
`PlanToStreamColMap` into account when building the projection.

Release note: None

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: gcjob and opt need ZoneConfig which is not available on tenants
4 participants