Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli/flags,config: new flag for tenant KV listen addr #50503

Merged
merged 4 commits into from
Jul 16, 2020

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jun 22, 2020

Fixes #51464.

This patch introduces the ability to split off the tenant KV server into a
separate port, using the new command-line flag --tenant-addr.

This is motivated primarily by security concerns. We plan to use different
TLS configs for KV<->KV kv/replication/gossip traffic and SQL<->KV tenant
kv traffic. The easiest way to facilitate this is to listen to tenant KV
traffic on a separate port and separate gRPC server.

The --tenant-addr and --advertise-tenant-addr flags are inspired by the
--sql-addr and --advertise-sql-addr flags. The two pairs have similar
fallback behavior, as is discussed in depth in 6cb71bf.

@nvanbenschoten nvanbenschoten requested a review from tbg June 22, 2020 20:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Let's also write the listener files for the tenant port (if the flag was changed):

https://github.com/cockroachdb/cockroach/blob/master/pkg/server/server.go#L769-L775

Is it OK for me to start working off this branch or do you anticipate more changes to be piled on?

Reviewed 21 of 21 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/cli/cliflags/flags.go, line 534 at r1 (raw file):

To actually use separate bindings, it is recommended to specify both
flags and use a different port number via --listen-addr, for example
--tenant-addr=:26257 --listen-addr=:26258. Ensure that --join is set

To make our life a little easier, I was thinking to suggest 36257 for the tenant.


pkg/cli/cliflags/flags.go, line 545 at r1 (raw file):

	TenantAdvertiseAddr = FlagInfo{
		Name: "advertise-tenant-addr",

nit: you write --advertise-tenant-addr in a few places, can those use "--"+TenantAdvertiseAddr.Name


pkg/roachpb/metadata.go, line 483 at r1 (raw file):

}

// CheckedTenandAddress returns the value of TenantAddress if set. If not,

Tenant here and in method name

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS this is still marked as draft, not sure it's intentional

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@tbg
Copy link
Member

tbg commented Jun 24, 2020

Test failure for your convenience:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x38b3f62]

goroutine 1 [running]:
panic(0x4435280, 0x7ab5310)
	/usr/local/go/src/runtime/panic.go:722 +0x2c2 fp=0xc000b05890 sp=0xc000b05800 pc=0x75bb62
github.com/cockroachdb/cockroach/pkg/util/log.RecoverAndReportPanic(0x53c9300, 0xc000072108, 0xc00046aa80)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/crash_reporting.go:88 +0x9f fp=0xc000b058e0 sp=0xc000b05890 pc=0xfdf15f
runtime.call32(0x0, 0x4b804b8, 0xc000b05f00, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:539 +0x3b fp=0xc000b05910 sp=0xc000b058e0 pc=0x78a4bb
panic(0x4435280, 0x7ab5310)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2 fp=0xc000b059a0 sp=0xc000b05910 pc=0x75ba52
runtime.panicmem(...)
	/usr/local/go/src/runtime/panic.go:199
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:394 +0x3ec fp=0xc000b059d0 sp=0xc000b059a0 pc=0x77177c
github.com/cockroachdb/cockroach/pkg/cli.extraServerFlagInit(0x7af4020, 0xc000b05cb8, 0x97f0d4)
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/flags.go:933 +0x8d2 fp=0xc000b05c50 sp=0xc000b059d0 pc=0x38b3f62
github.com/cockroachdb/cockroach/pkg/cli.init.8.func3(0x7af4020, 0xc00078bea0, 0x0, 0x7, 0x7b316a0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/flags.go:277 +0x2b fp=0xc000b05c80 sp=0xc000b05c50 pc=0x38ec08b
github.com/cockroachdb/cockroach/pkg/cli.AddPersistentPreRunE.func1(0x7af4020, 0xc00078bea0, 0x0, 0x7, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/flags.go:110 +0x5d fp=0xc000b05cc8 sp=0xc000b05c80 pc=0x38ebefd
github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute(0x7af4020, 0xc00078be30, 0x7, 0x7, 0x7af4020, 0xc00078be30)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:741 +0x56b fp=0xc000b05da0 sp=0xc000b05cc8 pc=0x35bcfeb
github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x7aec240, 0xc000072108, 0xc0002967ea, 0xc)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852 +0x2ea fp=0xc000b05e70 sp=0xc000b05da0 pc=0x35bd91a
github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
github.com/cockroachdb/cockroach/pkg/cli.Run(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:223
github.com/cockroachdb/cockroach/pkg/cli.Main()
	/go/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:67 +0x272 fp=0xc000b05f50 sp=0xc000b05e70 pc=0x388a712
main.main()

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also write the listener files for the tenant port (if the flag was changed):

Yeah, we're writing these. This is tested by TestListenerFileCreation.

Test failure for your convenience:

Fixed.

PS this is still marked as draft, not sure it's intentional

Yes, that was intentional. It's no longer a draft and gets all the way to starting a second gRPC server.

I updated this with a pair of follow on commits that changes the tenant proxy to route tenant KV traffic to this new tenant KV address when a node is configured with one. We now have unit tests and roachtests that exercise this logic.

PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/cli/cliflags/flags.go, line 534 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

To make our life a little easier, I was thinking to suggest 36257 for the tenant.

I like the idea of using 36257 in our deployments, but making that the default port iff --tenant-addr is specified but its port is omitted is tricky and not obviously worth the complexity, so I'll leave this as-is for now.


pkg/cli/cliflags/flags.go, line 545 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: you write --advertise-tenant-addr in a few places, can those use "--"+TenantAdvertiseAddr.Name

I only see one instance of this outside of comments:

return errors.Wrap(err, "invalid --advertise-tenant-addr")

Is that what you're referring to?


pkg/roachpb/metadata.go, line 483 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Tenant here and in method name

Done.

@nvanbenschoten nvanbenschoten marked this pull request as ready for review July 15, 2020 17:53
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner July 15, 2020 17:53
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantPort branch from 4655943 to 2fb707e Compare July 15, 2020 17:54
@nvanbenschoten
Copy link
Member Author

Huh, Github isn't picking up the changes on this branch. Looks like there's an ongoing incident: https://www.githubstatus.com/.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantPort branch 2 times, most recently from 557be8d to db74c96 Compare July 15, 2020 17:56
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this:

The --tenant-addr and --advertise-tenant-addr flags are inspired by the --sql-addr and --advertise-sql-addr flags. The two pairs have similar fallback behavior, as is discussed in depth in 6cb71bf.

  1. We found out that --advertise-sql-addr is not so useful after all. In contrast with the RPC addresses it so happens that crdb does not need to self-connect over public/private network boundaries, so that distinction was over-engineered (and is turning out unnecesary).

  2. I wonder here if --advertise-tenant-addr is actually necessary. This would only make sense if you had the following combination of design requirements:

  • a tenant SQL server needs to run in a separate DC / internet region than the KV server, so it needs to know the public address of the KV server; and
  • a tenant SQL server must discover the address of the KV server automatically by using some advertisement information from the KV server. (If there is no need for auto-discovery, there's no need to distinguish addr vs advertised addr)

The first requirement was not on planning for CC free tier, as far as I know; but even then, the second requirement is completely non-sensical with the KV/SQL disctinction, and therefore I think it makes no sense whatsoever to try to develop an "advertised addr".

Unless I misunderstood something?

Reviewed 1 of 21 files at r1, 3 of 16 files at r2, 1 of 5 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@nvanbenschoten
Copy link
Member Author

Good point, I hear what you're saying. I added the flag primarily to maintain consistency between this pair of flags and the the --sql-addr and --advertise-sql-addr pair of flags. But since you bring it up, we should discuss if the --advertise-tenant-addr is worth adding at all.

The first requirement was not on planning for CC free tier, as far as I know; but even then, the second requirement is completely non-sensical with the KV/SQL disctinction, and therefore I think it makes no sense whatsoever to try to develop an "advertised addr".

You are correct that the first requirement is not applicable to CC free tier. As far as I know, we don't plan to separate the public address of these KV servers from their private address. What I'm less sure about is whether there will ever be reason to make such distinctions. We do have aspirations to provide this infrastructure to others, so I don't think we should limit ourselves entirely to just how we plan to deploy the thing.

As for the second requirement, tenant SQL servers do need to discover the address of the KV servers automatically. This is why these advertised addrs are stored on NodeDescriptors that are propagated to SQL servers through the tenant proxy. I might be misunderstanding what you mean with this point.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantPort branch 2 times, most recently from 489abb1 to 13bb1d7 Compare July 15, 2020 19:14
@knz
Copy link
Contributor

knz commented Jul 15, 2020

tenant SQL servers do need to discover the address of the KV servers automatically. This is why these advertised addrs are stored on NodeDescriptors that are propagated to SQL servers through the tenant proxy.

That is ... odd. Are you telling me that a node descriptor arrives to the SQL server from the client-side proxy? How does the proxy even learn of the node descriptor then?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantPort branch from 13bb1d7 to 961cec5 Compare July 15, 2020 22:33
@nvanbenschoten
Copy link
Member Author

Are you telling me that a node descriptor arrives to the SQL server from the client-side proxy? How does the proxy even learn of the node descriptor then?

SQL processes are started with a set of seed KV addresses, passed in through the --kv-addrs flag. These addresses are used by the tenant proxy (not the SQL proxy! in case that's where the confusion is coming from) to subscribe to a subset of the gossip network's information. This subscription is where it learns about NodeDescriptors. It then uses these descriptors to route KV traffic to a KV node's tenant gRPC service, given a node ID from a RangeDescriptor. It uses the tenant proxy here as well, to delegate range lookups (fn(key) -> RangeDesc) to one of the seed KV nodes. Does that make sense?

@tbg
Copy link
Member

tbg commented Jul 16, 2020

I also find the "proxy" naming for this subsystem a little unfortunate due to the ambiguity. It's not really a proxy, anyway - no stand-alone component and it's not like it's passing anything "through" really, it's merely subscribing to updates from KV. It is more of a "connector" or "gateway" IMO. For example, we could rename the proxy to KVConnector or KVDiscovery and then we could say "the tenant SQL servers use run a KVConnector to maintain an up-to-date view of the KV layer's topology and configuration needed for routing requests to ranges".

tbg added a commit to tbg/cockroach that referenced this pull request Jul 16, 2020
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I (locally) rebased #51498 on top of this and will see about adding the right certs on the server side.

Reviewed 13 of 16 files at r2, 6 of 6 files at r4, 5 of 5 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/cli/cliflags/flags.go, line 534 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I like the idea of using 36257 in our deployments, but making that the default port iff --tenant-addr is specified but its port is omitted is tricky and not obviously worth the complexity, so I'll leave this as-is for now.

How about updating this to --listen-addr=:26257 and --tenant-addr=:36257 to match what we expect to be the canonical use?


pkg/cli/cliflags/flags.go, line 545 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I only see one instance of this outside of comments:

return errors.Wrap(err, "invalid --advertise-tenant-addr")

Is that what you're referring to?

Looks like it (based on my on grepping, the original comment is too long ago to remember)


pkg/cmd/roachtest/multitenant.go, line 28 at r6 (raw file):

	// Don't bind to external interfaces when running locally.
	tenantAddr := ifLocal("127.0.0.1", "0.0.0.0") + ":0"
	c.Start(ctx, t, c.All(), startArgs("--args=--tenant-addr="+tenantAddr))

Mind adding a comment here that we're starting a tenant server on another port,


pkg/cmd/roachtest/multitenant.go, line 33 at r6 (raw file):

	require.NoError(t, err)

	kvAddrs := c.ExternalAddr(ctx, c.All())

Shouldn't this be the tenant server address? I think as written the proxy will connect to the "wrong" port on KV (which will still work because of lack of the real auth).


pkg/server/testserver.go, line 793 at r5 (raw file):

}

// ServingTenantAddr returns the server's Tenant address. Should be used by clients.

"by clients"?

@knz
Copy link
Contributor

knz commented Jul 16, 2020

I went to #50520 to educate myself. In light of that, I retract my comment on the "advertise flag being nonsensical". Then, for consistency with the other flags, you can keep it. So the code LGTM.

That said I agree with Tobias that the word "proxy" is problematic. I like the proposals for "connector" and "discovery".

This patch introduces the ability to split off the tenant KV server into a
separate port, using the new command-line flag `--tenant-addr`.

This is motivated primarily by security concerns. We plan to use different
TLS configs for KV<->KV kv/replication/gossip traffic and SQL<->KV tenant
kv traffic. The easiest way to facilitate this is to listen to tenant KV
traffic on a separate port and separate gRPC server.

The `--tenant-addr` and `--advertise-tenant-addr` flags are inspired by the
`--sql-addr` and `--advertise-sql-addr` flags. The two pairs have similar
fallback behavior, as is discussed in depth in 6cb71bf.
This commit hooks up the tenant proxy to route all KV traffic from a
tenant SQL process to a node's tenant KV address, if on is configured.

As of the previous commit, TestServer now listens to tenant KV traffic
on a separate port (see `makeTestConfigFromParams`) and sets a
TenantAddress field in its NodeDescriptor, so with this change, any
logic test that uses `StartTenant` will now test this behavior.

As a sanity check, if we don't set up a second gRPC server in
`startListenRPCAndSQL`, all 3node-tenant logictests fail.
This commit configures the KV cluster in the `acceptance/multitenant`
roachtest to use a separate tenant address, which instructs the nodes
in the cluster to split off a second gRPC server on port 36257.
Removes some room for inconsistency.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantPort branch from 961cec5 to eaa01fd Compare July 16, 2020 17:47
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

That said I agree with Tobias that the word "proxy" is problematic. I like the proposals for "connector" and "discovery".

I empathize with the confusion. I'll rename to "connector" in a follow on PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @tbg)


pkg/cli/cliflags/flags.go, line 534 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

How about updating this to --listen-addr=:26257 and --tenant-addr=:36257 to match what we expect to be the canonical use?

Done.


pkg/cli/cliflags/flags.go, line 545 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Looks like it (based on my on grepping, the original comment is too long ago to remember)

I don't feel strongly for error strings, but done throughout addr_validation.go.


pkg/cmd/roachtest/multitenant.go, line 28 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Mind adding a comment here that we're starting a tenant server on another port,

Done.


pkg/cmd/roachtest/multitenant.go, line 33 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Shouldn't this be the tenant server address? I think as written the proxy will connect to the "wrong" port on KV (which will still work because of lack of the real auth).

Good point, done.


pkg/server/testserver.go, line 793 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

"by clients"?

Done.

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 16, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 16, 2020

Build succeeded

@craig craig bot merged commit 2c177b5 into cockroachdb:master Jul 16, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/tenantPort branch July 20, 2020 16:57
tbg added a commit to tbg/cockroach that referenced this pull request Jul 21, 2020
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jul 28, 2020
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
nvanbenschoten pushed a commit to nvanbenschoten/cockroach that referenced this pull request Jul 29, 2020
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jul 31, 2020
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 4, 2020
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 11, 2020
See discussion in cockroachdb#50503. We found "proxy" confusing, because the object was
in the same process as the rest of the SQL-only process instead of run as a
stand-alone component. "Connector" doesn't create the same confusion.
craig bot pushed a commit that referenced this pull request Aug 12, 2020
52599: kv/kvtenant: rename Proxy to Connector r=nvanbenschoten a=nvanbenschoten

See discussion in #50503. We found "proxy" confusing, because the object was
in the same process as the rest of the SQL-only process instead of run as a
stand-alone component. "Connector" doesn't create the same confusion.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
madelineliao pushed a commit to madelineliao/cockroach that referenced this pull request Aug 12, 2020
See discussion in cockroachdb#50503. We found "proxy" confusing, because the object was
in the same process as the rest of the SQL-only process instead of run as a
stand-alone component. "Connector" doesn't create the same confusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: expose grpc server for tenant use
4 participants