Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvclient: introduce new tenant Proxy #50520

Merged
merged 3 commits into from
Jul 15, 2020

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jun 23, 2020

Closes #47909.

This commit starts by adding two RPCs to the Internal service:

service Internal {
    ...
    rpc RangeFeed          (RangeFeedRequest)          returns (stream RangeFeedEvent)          {}
    rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {}
}

// RangeLookupRequest is a request to proxy a RangeLookup through a Tenant
// service. Its fields correspond to a subset of the args of kv.RangeLookup.
message RangeLookupRequest {
    ...
}

// GossipSubscriptionRequest initiates a game of telephone. It establishes an
// indefinite stream that proxies gossip information overheard by the recipient
// node back to the caller. Gossip information is filtered down to just those
// identified by a key matching any of the specified patterns.
//
// Upon establishment of the stream, all existing information that matches one
// or more of the patterns is returned. After this point, only new information
// matching the patterns is returned.
message GossipSubscriptionRequest {
    ...
}

The commit then introduces new kvtenant.Proxy object. Proxy mediates the communication of cluster-wide state to sandboxed SQL-only tenant processes through a restricted interface. A Proxy is seeded with a set of one or more network addresses that reference existing KV nodes in the cluster (or a load-balancer which fans out to some/all KV nodes). On startup, it establishes contact with one of these nodes to learn about the topology of the cluster and bootstrap the rest of SQL <-> KV network communication.

Proxy has two main roles:

First, Proxy is capable of providing information on each of the KV nodes in the cluster in the form of NodeDescriptors. This obviates the need for SQL-only tenant processes to join the cluster-wide gossip network. In doing so, it satisfies the NodeDescStore interface and can be used as an AddressResolver with a small adapter.

Second, Proxy is capable of providing Range addressing information in the form of RangeDescriptors through delegated RangeLookup requests. This is necessary because SQL-only tenants are restricted from reading Range Metadata keys directly. Instead, the RangeLookup requests are proxied through existing KV nodes while being subject to additional validation (e.g. is the Range being requested owned by the requesting tenant?). In doing so, it satisfies the RangeDescriptorDB interface and can be used to delegate all DistSender/RangeCache descriptor lookups to KV nodes.

With this commit, we can mostly run a SQL-only tenant process without joining the KV cluster's gossip network. This works if I comment out a few of the uses of gossip due to #49692 and #47150 in SQL. Notably, with the call to DeprecatedRegisterSystemConfigChannel in sql.Server.Start removed, I can remove Gossip from makeSQLServerArgs entirely and things "just work".

@nvanbenschoten nvanbenschoten requested a review from tbg June 23, 2020 04:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch from 81cbdc2 to 019b51d Compare June 23, 2020 05:51
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Some comments on how the proxy dials but overall this looks great.

Reviewed 2 of 2 files at r1, 16 of 16 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvclient/kvtenant/proxy.go, line 67 at r2 (raw file):

// NewProxy creates a new Proxy.
func NewProxy(

rpc.Context has a naked Stopper as of #50438, in case you're interested in using that instead.


pkg/kv/kvclient/kvtenant/proxy.go, line 102 at r2 (raw file):

		client, err := p.dial(ctx)
		if err != nil {
			log.Warningf(ctx, "error dialing tenant KV address: %v", err)

If you switched to the singleton-dialing approach I advocated for, you wouldn't have to log anything in this method. Instead, you would log in the centralized loop - would be clear that if you can't contact KV, not much else will work.


pkg/kv/kvclient/kvtenant/proxy.go, line 131 at r2 (raw file):

	nodeDescs := make(map[roachpb.NodeID]*roachpb.NodeDescriptor, len(resp.Descriptors))
	for _, desc := range resp.Descriptors {
		nodeDescs[desc.NodeID] = desc

Intuitively I would have expected the stream to contain updates for individual nodes (i.e. delta-based), but it always sends a whole view of the cluster nodes.
I think this is just because it's easier and that's fine, perhaps add a comment.


pkg/kv/kvclient/kvtenant/proxy.go, line 187 at r2 (raw file):

// FirstRange implements the kvcoord.RangeDescriptorDB interface.
func (p *Proxy) FirstRange() (*roachpb.RangeDescriptor, error) {
	return nil, errors.New("kvtenant.Proxy does not have access to FirstRange")

Any plans for getting rid of this wart? The only call is

desc, err := rdc.db.FirstRange()

Seems a little annoying to get rid of, just wanted to know what your plans were, if any.


pkg/kv/kvclient/kvtenant/proxy.go, line 190 at r2 (raw file):

}

func (p *Proxy) dial(ctx context.Context) (roachpb.InternalClient, error) {

nit: the methods in this file had lined up nicely for review, but feels like dial ought to be further up, below where it's first used. (with dialAddr following suit).


pkg/kv/kvclient/kvtenant/proxy.go, line 203 at r2 (raw file):

	// expect the proxy to be pointed at a load balancer which will perform
	// internal health checks and prioritize healthy destinations, so maybe
	// that's not worth it.

The one question I have for you is whether the conn should be sticky. Right now, it seems like we're picking a random node for each call to dial, I would've thought we hold a *rpc.Connection around and dial just calls conn.Connect(ctx) and on error, makes a new round-robin connection:

	const key = "dial"
	v, err, _ := p.singleFlight.Do(key, func() (interface{}, error) {
		// TODO put retry loop in here.
		conn, err := p.dialAddr(ctx, p.randomAddr())
		_, err := conn.Connect(ctx) // TODO use new ctx with timeout
		if err != nil {
			return nil, err
		}
		return conn, nil
	})
	if err != nil {
		return nil, err
	}
	cc, err := v.(*rpc.Connection).Connect(ctx)
	if err != nil {
		if ctx.Err() == nil {
			// TODO: multiple callers may end up here so one of the Forget
			// calls may evict a newer, healthy, conn. Should make it such
			// that only the first caller to get an error gets to call Forget.
			// May not be worth using singleflight once we do that.
			p.singleFlight.Forget("key")
		}
		return nil, err
	}
	return cc, nil
}

As is, if we point tenants at a KV-wide LB, won't we end up with each tenant connecting to all of the KV nodes? That seems a little silly, should "just" pick one and stick with it until it goes away.

This also answers the breaker question - by limiting to a single dial at any given point in time as in my pseudo-Go above, we won't log spam. Also, due to the sticky connection, once we have a healthy node we won't try the unhealthy node any more, so it should self-stabilize quite well.


pkg/roachpb/api.proto, line 2101 at r2 (raw file):

  // SQL-only processes to have access to StoreDescriptors is going away
  // with #49997 and the tenant ZoneConfig updates might be better serviced
  // with a polling mechanism, so I specialized the request. I'm curious

Why do you consider polling a better option for zone config updates? Zones are and will be distributed via Gossip internally in KV, right? If it folds in seamlessly with this RPC, I think we might as well do it.


pkg/server/node.go, line 954 at r2 (raw file):

	args *roachpb.RangeFeedRequest, stream roachpb.Internal_RangeFeedServer,
) error {
	growstack.Grow()

drive-by?


pkg/server/server_sql.go, line 131 at r2 (raw file):

	externalStorageFromURI cloud.ExternalStorageFromURIFactory

	// TODO(nvanbenschoten): Should this move to a second "optional" args

I don't have a strong urge to change this (other than the cleanups you've identified above and getting rid of some dependencies we don't want to have), but am open to it if you feel that it would help keep this more understandable.


pkg/server/testserver.go, line 498 at r2 (raw file):

	nodeDialer := nodedialer.New(
		rpcContext,
		gossip.AddressResolver(g), // TODO(nvb): break gossip dep

🎆

@@ -0,0 +1,228 @@
// Copyright 2020 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close to being able to make this CCL, but not worth it right now.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch 5 times, most recently from 2d954d6 to 1d3c66d Compare July 2, 2020 19:45
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! The big change since you last saw this is that I've replaced the NodeInfo RPC with a more general GossipSubscription stream. This should resolve the discussion we were having below. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvclient/kvtenant/proxy.go, line 3 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

So close to being able to make this CCL, but not worth it right now.

Could you remind me again why we can't do this now? Just because we'd need to pull this into a pkg/ccl/kvccl/kvtenant package?


pkg/kv/kvclient/kvtenant/proxy.go, line 67 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

rpc.Context has a naked Stopper as of #50438, in case you're interested in using that instead.

Done.


pkg/kv/kvclient/kvtenant/proxy.go, line 102 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

If you switched to the singleton-dialing approach I advocated for, you wouldn't have to log anything in this method. Instead, you would log in the centralized loop - would be clear that if you can't contact KV, not much else will work.

Done.


pkg/kv/kvclient/kvtenant/proxy.go, line 131 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Intuitively I would have expected the stream to contain updates for individual nodes (i.e. delta-based), but it always sends a whole view of the cluster nodes.
I think this is just because it's easier and that's fine, perhaps add a comment.

Yes, this was because it was easier to implement. A marshaled NodeDescriptor is on the order of 100 bytes (depending on the size of variable-length strings), so with a 512 node cluster, that's only about 50KB whenever a node joins or leaves the cluster.

But I've reworked this (see the discussion below), which naturally allowed this to become delta-based. So done.


pkg/kv/kvclient/kvtenant/proxy.go, line 187 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Any plans for getting rid of this wart? The only call is

desc, err := rdc.db.FirstRange()

Seems a little annoying to get rid of, just wanted to know what your plans were, if any.

No, I didn't have any plans to get rid of it. The RangeCache should never be requesting the FirstRange in a tenant process because it will only need to if it has already recursed up the meta range tree, which wouldn't be possible with this implementation of RangeDescriptorDB.

I guess that indicates that we should pull FirstRange out of the RangeDescriptorDB interface entirely and push the keys.RangeMetaKey(key).Equal(roachpb.RKeyMin) conditional logic into DistSender's implementation of RangeDescriptorDB. What do you think?


pkg/kv/kvclient/kvtenant/proxy.go, line 190 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: the methods in this file had lined up nicely for review, but feels like dial ought to be further up, below where it's first used. (with dialAddr following suit).

Hm, not totally sure I agree. I think it's nice to separate the business logic from the networking logic.


pkg/kv/kvclient/kvtenant/proxy.go, line 203 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

The one question I have for you is whether the conn should be sticky. Right now, it seems like we're picking a random node for each call to dial, I would've thought we hold a *rpc.Connection around and dial just calls conn.Connect(ctx) and on error, makes a new round-robin connection:

	const key = "dial"
	v, err, _ := p.singleFlight.Do(key, func() (interface{}, error) {
		// TODO put retry loop in here.
		conn, err := p.dialAddr(ctx, p.randomAddr())
		_, err := conn.Connect(ctx) // TODO use new ctx with timeout
		if err != nil {
			return nil, err
		}
		return conn, nil
	})
	if err != nil {
		return nil, err
	}
	cc, err := v.(*rpc.Connection).Connect(ctx)
	if err != nil {
		if ctx.Err() == nil {
			// TODO: multiple callers may end up here so one of the Forget
			// calls may evict a newer, healthy, conn. Should make it such
			// that only the first caller to get an error gets to call Forget.
			// May not be worth using singleflight once we do that.
			p.singleFlight.Forget("key")
		}
		return nil, err
	}
	return cc, nil
}

As is, if we point tenants at a KV-wide LB, won't we end up with each tenant connecting to all of the KV nodes? That seems a little silly, should "just" pick one and stick with it until it goes away.

This also answers the breaker question - by limiting to a single dial at any given point in time as in my pseudo-Go above, we won't log spam. Also, due to the sticky connection, once we have a healthy node we won't try the unhealthy node any more, so it should self-stabilize quite well.

Very good point. We can't use a singleFlight group quite like your example wants because the group doesn't maintain state after the requst completes, but I updated the code with this as inspiration. We now cache a single InternalClient and use it as long as possible across all clients of the proxy. PTAL.


pkg/roachpb/api.proto, line 2101 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Why do you consider polling a better option for zone config updates? Zones are and will be distributed via Gossip internally in KV, right? If it folds in seamlessly with this RPC, I think we might as well do it.

I was struggling to find the right abstraction here for the packaging of node descriptors and a single specific zone config over a shared stream. I've decided to generalize this interface significantly as a way to make it more useful and also more easily applicable here.

The result is replacing NodeInfo with a more general GossipSubscription RPC, which is defined as:

// GossipSubscriptionRequest initiates a game of telephone. It establishes an
// indefinite stream that proxies gossip information overheard by the recipient
// node back to the caller. Gossip information is filtered down to just those
// identified by a key matching any of the specified patterns.
//
// Upon establishment of the stream, all existing information that matches one
// or more of the patterns is returned. After this point, only new information
// matching the patterns is returned.
message GossipSubscriptionRequest {
  repeated string patterns = 1;
}

// GossipSubscriptionEvent is a single piece of proxied gossip information.
message GossipSubscriptionEvent {
  string key     = 1;
  Value  content = 2 [(gogoproto.nullable) = false];
  // Which pattern does this gossip information match?
  string pattern_matched = 3;
  // If non-nil, the other fields will be empty and this will be the final event
  // send on the stream before it is terminated.
  Error error = 4;
}

The idea is that we then put access control policies on the specific patterns that a tenant process is able to request and then let clients request whichever patterns they want. So instead of creating a separate RPC for each subset of the gossip network information that we export, we can have clients request multiple patterns and validate that the patterns requested are permitted. I expect that the first two patterns allowed will be:

  • "node:.*"
  • "system-db:zones/1/tenants"

One nice thing that falls out of this is that I was able to remove all of the gossip changes I had added in before and use the standard RegisterCallback interface instead. Another nice fallout from this is that we naturally get a delta-based protocol instead of the original bulk replacement protocol.

A wart in the plan is that the SystemConfig is stored under a single "system-db" key that contains a struct{vals []roachpb.KeyValue} value. My plan is to add some notion of indexing/filtering into the interfaces that gossip provides so that callers can specify key prefixes that they want to hear updates about and resulting SystemConfig structs would be filtered down to just these keys. Effectively, virtualizing the existence of multiple gossip keys from the perspective of the callback. This is essentially what we're already doing with the SystemConfigDeltaFilter in places like:

k := execCfg.Codec.IndexPrefix(uint32(keys.ZonesTableID), uint32(keys.ZonesTablePrimaryIndexID))
zoneCfgFilter := gossip.MakeSystemConfigDeltaFilter(k)
gossipUpdateC := execCfg.Gossip.DeprecatedRegisterSystemConfigChannel(47150)
return zoneCfgFilter, gossipUpdateC

But we could now push this complexity into Gossip itself. We'd then add a pattern like "system-db:zones/1/tenants" to the allowlist of patterns that a tenant can request in a gossip subscription.

What do you think?


pkg/roachpb/api.proto, line 2053 at r4 (raw file):

  repeated RangeDescriptor prefetched_descriptors = 2 [(gogoproto.nullable) = false];
  // If non-nil, the other fields will be empty.
  Error error = 3;

I almost made the mistake of conflating RPC errors and logical errors, which made error handling in the proxy much more complicated. This is now fixed.


pkg/server/node.go, line 954 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

drive-by?

Yeah, this wasn't needed.


pkg/server/server_sql.go, line 131 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I don't have a strong urge to change this (other than the cleanups you've identified above and getting rid of some dependencies we don't want to have), but am open to it if you feel that it would help keep this more understandable.

I'll keep the TODO (without the questions) but won't address it now.

@lunevalex lunevalex added the A-multitenancy Related to multi-tenancy label Jul 6, 2020
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch from 1d3c66d to 82267f2 Compare July 13, 2020 04:55
@nvanbenschoten
Copy link
Member Author

@andy-kimball and I talked about this late last week and we decided that it would be a good idea to move the proxy implementation under the CCL license while we still can. Doing so was very straightforward, so I made the change here. PTAL.

@tbg tbg self-requested a review July 14, 2020 12:50
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught up on the diffs, now to review the proxy connection code again (diff wasn't visible because of the move to CCL)

Reviewed 15 of 15 files at r5, 18 of 18 files at r6, 4 of 4 files at r7.
Dismissed @nvanbenschoten from a discussion.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/kv/kvclient/kvtenant/proxy.go, line 187 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No, I didn't have any plans to get rid of it. The RangeCache should never be requesting the FirstRange in a tenant process because it will only need to if it has already recursed up the meta range tree, which wouldn't be possible with this implementation of RangeDescriptorDB.

I guess that indicates that we should pull FirstRange out of the RangeDescriptorDB interface entirely and push the keys.RangeMetaKey(key).Equal(roachpb.RKeyMin) conditional logic into DistSender's implementation of RangeDescriptorDB. What do you think?

Yes, this was the thought I also had. Let's leave it for a follow-up though, I'm already a bit lost in this one.


pkg/roachpb/api.proto, line 2101 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I was struggling to find the right abstraction here for the packaging of node descriptors and a single specific zone config over a shared stream. I've decided to generalize this interface significantly as a way to make it more useful and also more easily applicable here.

The result is replacing NodeInfo with a more general GossipSubscription RPC, which is defined as:

// GossipSubscriptionRequest initiates a game of telephone. It establishes an
// indefinite stream that proxies gossip information overheard by the recipient
// node back to the caller. Gossip information is filtered down to just those
// identified by a key matching any of the specified patterns.
//
// Upon establishment of the stream, all existing information that matches one
// or more of the patterns is returned. After this point, only new information
// matching the patterns is returned.
message GossipSubscriptionRequest {
  repeated string patterns = 1;
}

// GossipSubscriptionEvent is a single piece of proxied gossip information.
message GossipSubscriptionEvent {
  string key     = 1;
  Value  content = 2 [(gogoproto.nullable) = false];
  // Which pattern does this gossip information match?
  string pattern_matched = 3;
  // If non-nil, the other fields will be empty and this will be the final event
  // send on the stream before it is terminated.
  Error error = 4;
}

The idea is that we then put access control policies on the specific patterns that a tenant process is able to request and then let clients request whichever patterns they want. So instead of creating a separate RPC for each subset of the gossip network information that we export, we can have clients request multiple patterns and validate that the patterns requested are permitted. I expect that the first two patterns allowed will be:

  • "node:.*"
  • "system-db:zones/1/tenants"

One nice thing that falls out of this is that I was able to remove all of the gossip changes I had added in before and use the standard RegisterCallback interface instead. Another nice fallout from this is that we naturally get a delta-based protocol instead of the original bulk replacement protocol.

A wart in the plan is that the SystemConfig is stored under a single "system-db" key that contains a struct{vals []roachpb.KeyValue} value. My plan is to add some notion of indexing/filtering into the interfaces that gossip provides so that callers can specify key prefixes that they want to hear updates about and resulting SystemConfig structs would be filtered down to just these keys. Effectively, virtualizing the existence of multiple gossip keys from the perspective of the callback. This is essentially what we're already doing with the SystemConfigDeltaFilter in places like:

k := execCfg.Codec.IndexPrefix(uint32(keys.ZonesTableID), uint32(keys.ZonesTablePrimaryIndexID))
zoneCfgFilter := gossip.MakeSystemConfigDeltaFilter(k)
gossipUpdateC := execCfg.Gossip.DeprecatedRegisterSystemConfigChannel(47150)
return zoneCfgFilter, gossipUpdateC

But we could now push this complexity into Gossip itself. We'd then add a pattern like "system-db:zones/1/tenants" to the allowlist of patterns that a tenant can request in a gossip subscription.

What do you think?

Nice.


pkg/server/node.go, line 1027 at r6 (raw file):

		// RegisterSystemConfigChannel for any SystemConfig patterns.

		// NB: all callbacks are run on a single goroutine.

Unless this is a well-documented feature of gossip callbacks, I would rather not rely on this and instead close over a loop-local mutex that ensures sequential execution. Reading this makes me feel obligated to go and check whether it's (still) true.


pkg/server/node.go, line 1058 at r6 (raw file):

				// The consumer was not keeping up with gossip updates, so its
				// subscription was terminated to avoid blocking gossip.
				err := roachpb.NewErrorf("subscription terminated due to slow consumption")

Is this worth logging? It will be logged at the SQL node, right?

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch from 82267f2 to 24a804e Compare July 14, 2020 20:56
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I had to move the 3node-tenant logic test configuration to ccl/logictestccl in order to support the proxy movement, but I think it actually came out nicely.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvclient/kvtenant/proxy.go, line 187 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Yes, this was the thought I also had. Let's leave it for a follow-up though, I'm already a bit lost in this one.

Added a TODO.


pkg/server/node.go, line 1027 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Unless this is a well-documented feature of gossip callbacks, I would rather not rely on this and instead close over a loop-local mutex that ensures sequential execution. Reading this makes me feel obligated to go and check whether it's (still) true.

Done.


pkg/server/node.go, line 1058 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is this worth logging? It will be logged at the SQL node, right?

Done. It will also be logged at the SQL node.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch from 24a804e to 22d86b2 Compare July 14, 2020 22:12
@nvanbenschoten nvanbenschoten requested review from a team and adityamaru and removed request for a team and adityamaru July 14, 2020 22:12
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch 2 times, most recently from 5d359e7 to 4280339 Compare July 15, 2020 00:07
Fixes cockroachdb#47909.

This commit starts by adding two RPCs to the Internal service:
```
service Internal {
	...
	rpc RangeFeed          (RangeFeedRequest)          returns (stream RangeFeedEvent)          {}
	rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {}
}

// RangeLookupRequest is a request to proxy a RangeLookup through a Tenant
// service. Its fields correspond to a subset of the args of kv.RangeLookup.
message RangeLookupRequest {
    ...
}

// GossipSubscriptionRequest initiates a game of telephone. It establishes an
// indefinite stream that proxies gossip information overheard by the recipient
// node back to the caller. Gossip information is filtered down to just those
// identified by a key matching any of the specified patterns.
//
// Upon establishment of the stream, all existing information that matches one
// or more of the patterns is returned. After this point, only new information
// matching the patterns is returned.
message GossipSubscriptionRequest {
    ...
}
```

The commit then introduces new `kvtenant.Proxy` object. Proxy mediates
the communication of cluster-wide state to sandboxed SQL-only tenant
processes through a restricted interface. A Proxy is seeded with a set
of one or more network addresses that reference existing KV nodes in the
cluster (or a load-balancer which fans out to some/all KV nodes). On
startup, it establishes contact with one of these nodes to learn about
the topology of the cluster and bootstrap the rest of SQL <-> KV network
communication.

Proxy has two main roles:

First, Proxy is capable of providing information on each of the KV nodes
in the cluster in the form of NodeDescriptors. This obviates the need
for SQL-only tenant processes to join the cluster-wide gossip network.
In doing so, it satisfies the `NodeDescStore` interface and can be used
as an `AddressResolver` with a small adapter.

Second, Proxy is capable of providing Range addressing information in
the form of RangeDescriptors through delegated RangeLookup requests.
This is necessary because SQL-only tenants are restricted from reading
Range Metadata keys directly. Instead, the RangeLookup requests are
proxied through existing KV nodes while being subject to additional
validation (e.g. is the Range being requested owned by the requesting
tenant?). In doing so, it satisfies the `RangeDescriptorDB` interface
and can be used to delegate all DistSender/RangeCache descriptor lookups
to KV nodes.

With this commit, we can mostly run a SQL-only tenant process without
joining the KV cluster's gossip network. This works if I comment out a
few of the uses of gossip due to cockroachdb#49692 and cockroachdb#47150 in SQL. Notably,
with the call to `DeprecatedRegisterSystemConfigChannel` in `sql.Server.Start`
removed, I can remove `Gossip` from `makeSQLServerArgs` entirely and
things "just work".
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantService branch from 4280339 to b19794d Compare July 15, 2020 01:06
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build succeeded

@craig craig bot merged commit 50a3c14 into cockroachdb:master Jul 15, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/tenantService branch July 15, 2020 16:31
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jul 29, 2020
… server

Fixes cockroachdb#49445.

This commit introduces a new SystemConfigProvider abstraction, which is
capable of providing the SystemConfig, as well as notifying clients of
updates to the SystemConfig. Gossip already implements this interface.
The commit then updates SQL to use this new dependency in place of
Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to
implement the new SystemConfigProvider interface. This is powered by the
GossipSubscription RPC that was added in cockroachdb#50520. The commit updates the
subscription to also match on the "system-db" gossip key, and just like
that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a
SystemConfigProvider to SQL when applicable, we're able to remove gossip
entirely from StartTenant. SQL-only servers will no longer join the
gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to
enforce an access control policy on the system config gossip pattern.
See the updated comment in `Node.GossipSubscription`. For now, we're
just returning the entire SystemConfig to the subscription.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jul 29, 2020
… server

Fixes cockroachdb#49445.

This commit introduces a new SystemConfigProvider abstraction, which is
capable of providing the SystemConfig, as well as notifying clients of
updates to the SystemConfig. Gossip already implements this interface.
The commit then updates SQL to use this new dependency in place of
Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to
implement the new SystemConfigProvider interface. This is powered by the
GossipSubscription RPC that was added in cockroachdb#50520. The commit updates the
subscription to also match on the "system-db" gossip key, and just like
that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a
SystemConfigProvider to SQL when applicable, we're able to remove gossip
entirely from StartTenant. SQL-only servers will no longer join the
gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to
enforce an access control policy on the system config gossip pattern.
See the updated comment in `Node.GossipSubscription`. For now, we're
just returning the entire SystemConfig to the subscription.
tbg pushed a commit to nvanbenschoten/cockroach that referenced this pull request Jul 31, 2020
… server

Fixes cockroachdb#49445.

This commit introduces a new SystemConfigProvider abstraction, which is
capable of providing the SystemConfig, as well as notifying clients of
updates to the SystemConfig. Gossip already implements this interface.
The commit then updates SQL to use this new dependency in place of
Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to
implement the new SystemConfigProvider interface. This is powered by the
GossipSubscription RPC that was added in cockroachdb#50520. The commit updates the
subscription to also match on the "system-db" gossip key, and just like
that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a
SystemConfigProvider to SQL when applicable, we're able to remove gossip
entirely from StartTenant. SQL-only servers will no longer join the
gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to
enforce an access control policy on the system config gossip pattern.
See the updated comment in `Node.GossipSubscription`. For now, we're
just returning the entire SystemConfig to the subscription.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 3, 2020
… server

Fixes cockroachdb#49445.

This commit introduces a new SystemConfigProvider abstraction, which is
capable of providing the SystemConfig, as well as notifying clients of
updates to the SystemConfig. Gossip already implements this interface.
The commit then updates SQL to use this new dependency in place of
Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to
implement the new SystemConfigProvider interface. This is powered by the
GossipSubscription RPC that was added in cockroachdb#50520. The commit updates the
subscription to also match on the "system-db" gossip key, and just like
that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a
SystemConfigProvider to SQL when applicable, we're able to remove gossip
entirely from StartTenant. SQL-only servers will no longer join the
gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to
enforce an access control policy on the system config gossip pattern.
See the updated comment in `Node.GossipSubscription`. For now, we're
just returning the entire SystemConfig to the subscription.
craig bot pushed a commit that referenced this pull request Aug 3, 2020
52034: kvtenant: implement SystemConfigProvider, remove Gossip from SQL-only server r=nvanbenschoten a=nvanbenschoten

Fixes #49445.

This commit introduces a new SystemConfigProvider abstraction, which is capable of providing the SystemConfig, as well as notifying clients of updates to the SystemConfig. Gossip already implements this interface. The commit then updates SQL to use this new dependency in place of Gossip whenever it needs to access the SystemConfig.

After making this change, it then updates the kvtenant.Proxy to implement the new SystemConfigProvider interface. This is powered by the GossipSubscription RPC that was added in #50520. The commit updates the subscription to also match on the "system-db" gossip key, and just like that, it can provide the SystemConfig to SQL [*].

Finally, with the kvtenant.Proxy serving the role of a SystemConfigProvider to SQL when applicable, we're able to remove gossip entirely from StartTenant. SQL-only servers will no longer join the gossip network, which is a nice milestone for all of this work.

[*] there are a few remaining questions about how exactly we want to enforce an access control policy on the system config gossip pattern. See the updated comment in `Node.GossipSubscription`. For now, we're just returning the entire SystemConfig to the subscription.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: DistSender for SQL tenants
4 participants