-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: enabling instantiating secondary tenant servers in-memory #84608
Conversation
Submitting draft PR to see what CI has to say. |
CI is happy overall. Next steps:
|
Rebased; this is working again now. |
833ce28
to
707210d
Compare
b26c74a
to
55a2a6a
Compare
@andreimatei would you be interested to participate in reviewing this code? |
217ea88
to
3a64362
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"interested" is not exactly the word I'd use, but I'm happy to do it if you don't have a better reviewer (which you may well not have).
Reviewable status: complete! 0 of 0 LGTMs obtained
tbh I had thought you'd be interested because you did ask a lot of questions and had some valuable suggestions. But I was planning to get a reviewer from the MT v-team anyway. I'll touch base with you when we're a bit further in the process so you get a peek of this work in a more advanced state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #84604.
Maybe not "Fixes"? Seems like there's follow-up here.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/server/server_controller.go
line 257 at r5 (raw file):
if !cluster.TelemetryOptOut() { tenantServer.StartDiagnostics(startCtx) }
is it not possible for the tenantServer to own its own StartDiagnostics
call?
pkg/server/server_controller.go
line 287 at r5 (raw file):
} tr := tracing.NewTracerWithOpt(ctx, tracing.WithClusterSettings(&st.SV))
Question about settings: when do they get populated with their saved values from SQL?
pkg/server/server_controller.go
line 310 at r5 (raw file):
return baseCfg, sqlCfg, errors.Wrap(err, "cannot create store spec") } baseCfg = MakeBaseConfig(st, tr, storeSpec)
Is there a single storeDir for all tenants? I'm not 100% sure how this part works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not "Fixes"? Seems like there's follow-up here.
This PR fixes this specific issue. All the followup work is tracked in #91282.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)
pkg/server/server_controller.go
line 257 at r5 (raw file):
Previously, dhartunian (David Hartunian) wrote…
is it not possible for the tenantServer to own its own
StartDiagnostics
call?
Ok - added a separate commit to do that.
pkg/server/server_controller.go
line 287 at r5 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Question about settings: when do they get populated with their saved values from SQL?
NB: WithClusterSettings
merely attaches the SV
object to the tracer, it doesn't read/copy setting values.
Regarding your question: after s.settingsWatcher.Start
and s.systemConfigWatcher.Start
have been called, in sqlServer.preStart()
.
pkg/server/server_controller.go
line 310 at r5 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Is there a single storeDir for all tenants? I'm not 100% sure how this part works.
Woops, you're right.
We have logic in standalone SQL servers (in cli/flags.go) to generate distinct store directories. Added here too.
3a64362
to
e07b193
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some drive-by comments
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @smg260, and @srosenberg)
pkg/server/server_controller.go
line 112 at r9 (raw file):
} func (c *serverController) get(ctx context.Context, tenantName string) (onDemandServer, error) {
I think this deserves a comment saying restating that the tenant is instantiated if it doesn't exist. I'd also name it getOrCreateTenant
or such.
pkg/server/server_controller.go
line 122 at r9 (raw file):
c.mu.nextServerIdx++ idx := c.mu.nextServerIdx s, err := c.newServerFn(ctx, tenantName, idx, func() {
I'd make newServerForTenant
document a specific error type returned when the tenant doesn't exist, and I'd carry forward that documentation to newServerFn
; besides the error deserving a type, the docs will implicitly suggest what happens when the tenant doesn't exist.
pkg/server/server_controller.go
line 123 at r9 (raw file):
idx := c.mu.nextServerIdx s, err := c.newServerFn(ctx, tenantName, idx, func() { c.mu.Lock()
consider extracting this in a method called deregisterTenant
or so. It took me a while to see what this argument represents and how come this call doesn't deadlock.
pkg/server/server_controller.go
line 142 at r9 (raw file):
// under lock and stop them without lock, so that their deregister // function can operate on the map. servers := func() (res []onDemandServer) {
I think this inline functions only makes the code harder to read; there's a single return from the inner func
pkg/server/server_controller.go
line 186 at r9 (raw file):
// Start the tenant server. tenantStopper, tenantServer, err := s.startInMemoryTenantServerInternal(ctx, tenantID, index)
consider logging something around here
pkg/server/server_controller.go
line 198 at r9 (raw file):
// tenantServerWrapper implements the onDemandServer interface for SQLServerWrapper. // // TODO(knz): Evaluate if the two can be merged together.
whatever this is referring to, I think this is the time
pkg/server/server_controller.go
line 215 at r9 (raw file):
// systemServerWrapper implements the onDemandServer interface for Server. // // TODO(knz): Evaluate if this can be eliminated.
I think this is the time. I'd definitely rather we don't have this struct.
pkg/server/server_controller.go
line 223 at r9 (raw file):
func (s *systemServerWrapper) stop(ctx context.Context) { // Do nothing.
nit: this comment adds "nothing". But is calling this unexpected / should this panic ?
pkg/server/server_controller.go
line 290 at r9 (raw file):
ctx context.Context, tenantID roachpb.TenantID, index int,
the index
probably deserves a comment
pkg/server/server_controller.go
line 293 at r9 (raw file):
kvServerCfg Config, stopper *stop.Stopper, ) (baseCfg BaseConfig, sqlCfg SQLConfig, err error) {
nit: why use named returns here? Their names are not useful as documentation.
pkg/server/server_controller.go
line 297 at r9 (raw file):
// This version initialization is copied from cli/mt_start_sql.go. // TODO(knz): Why is this even useful? The comment refers to v21.1 compatibility,
you don't wrap bro
efe6062
to
0445024
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @dhartunian, @smg260, and @srosenberg)
pkg/server/server_controller.go
line 112 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think this deserves a comment saying restating that the tenant is instantiated if it doesn't exist. I'd also name it
getOrCreateTenant
or such.
Done.
pkg/server/server_controller.go
line 122 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'd make
newServerForTenant
document a specific error type returned when the tenant doesn't exist, and I'd carry forward that documentation tonewServerFn
; besides the error deserving a type, the docs will implicitly suggest what happens when the tenant doesn't exist.
Done.
pkg/server/server_controller.go
line 123 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
consider extracting this in a method called
deregisterTenant
or so. It took me a while to see what this argument represents and how come this call doesn't deadlock.
Clarified.
pkg/server/server_controller.go
line 142 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think this inline functions only makes the code harder to read; there's a single return from the inner func
Done.
pkg/server/server_controller.go
line 186 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
consider logging something around here
Done.
pkg/server/server_controller.go
line 198 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
whatever this is referring to, I think this is the time
Removed.
pkg/server/server_controller.go
line 215 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think this is the time. I'd definitely rather we don't have this struct.
Removed.
pkg/server/server_controller.go
line 223 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: this comment adds "nothing". But is calling this unexpected / should this panic ?
Removed the comment, but clarified the purpose of the type.
pkg/server/server_controller.go
line 290 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
the
index
probably deserves a comment
Done.
pkg/server/server_controller.go
line 293 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: why use named returns here? Their names are not useful as documentation.
Because return BaseConfig{}, SQLConfig{}, err
in the early error path is less readable.
pkg/server/server_controller.go
line 297 at r9 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
you don't wrap bro
Done.
Epic: CRDB-14537 This commit lays the groundwork for the ability to run tenant servers in-memory. It introduces a new `serverController` with two roles: - it maps tenant names to server instances. - it only starts servers the first time the tenant name is referenced, and only if that tenant is marked active. As of this commit, no subsystem inside CockroachDB refers to this new server controller; this means there is no signal hooked up to start these servers automatically yet. For testing, a debug HTTP endpoint has been added: `/debug/tickle?name=<tenantname>` Example use: 1. start a server. At this point no secondary tenant server is created yet. 2. create a test tenant, e.g. via `select crdb_internal.create_tenant(123, 'hello');`. At this point, the secondary tenant server is still not running. 3. Perform a HTTP request to the debug endpoint, e.g. to `/debug/tickle?name=hello` 4. Observe (e.g. in logs): the secondary server has been started. One can also observe that the controller also serves the name `system` to refer to the system tenant. For now, the secondary servers created this way use separate network listeners for SQL/HTTP/RPC. NB: This mechanism will be superseded when cockroachdb#84604 is addressed. The port number is assigned randomly. To derive a predictable port number for testing (until issue cockroachdb#84604 is addressed), the operator can pass e.g. `--secondary-tenant-port-offset=100` to the start command (for 100 past the base port number). Release note: None
Release note: None
0445024
to
65c7eb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx. no more comments from me.
Reviewed 5 of 19 files at r3, 4 of 14 files at r4, 1 of 15 files at r9, 14 of 14 files at r11, 12 of 12 files at r12, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @smg260, and @srosenberg)
thanks! bors r=dhartunian,andreimatei |
Build failed (retrying...): |
91744: server: use a shared HTTP listener for all in-memory tenant servers r=andreimatei,dhartunian a=knz Parent PRs: - [ ] #84608 - [ ] #91739 Fixes #91738. This commit introduces a HTTP (de)multiplexer for all in-memory tenant servers. By default, HTTP requests are routed to the system tenant server. This can be overridden with the header `X-Cockroach-Tenant` or the `tenant` cookie. Epic: CRDB-14487 Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
92580: server,pgwire: use a single SQL listener for multiple tenants r=rafiss a=knz Parent PRs: - [x] #84608 - [x] #91739 - [x] #91744 - [x] #92574 - [x] #92575 - [x] #92576 - [x] #92577 - [x] #92578 - [x] #92579 - [ ] #94901 Fixes #84585. Informs #94310. Epic: CRDB-14537 This commit implements tenant routing using a single SQL network listener. The tenant name can be specified: - via the client status param `options`, e.g. `options=-ccrdb:tenant=hello` - via the database name, e.g. `crdb:tenant-hello.defaultdb`. Release note (backward-incompatible change): If a SQL database is created with a name that starts with `crdb:tenant-` (e.g. `CREATE DATABASE "crdb:tenant-foo"`, clients will no longer be able to connect to it directly. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
84700: rfcs: new RFC on shared-process deployments for cluster virtualization r=stevendanna a=knz Text of RFC: https://github.com/knz/cockroach/blob/20220720-rfc-in-memory-tenants/docs/RFCS/20220720_shared_process_deployments.md Some prototype work ongoing here: #84608. Epic: CRDB-14537 Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Epic: CRDB-14537
Fixes #84604.
This PR lays the groundwork for the ability to run tenant servers
in-memory. It introduces a new
serverController
with two roles:and only if that tenant is marked active.
As of this PR, no subsystem inside CockroachDB refers to this
new server controller; this means there is no signal hooked up
to start these servers automatically yet.
For testing, a debug HTTP endpoint has been added:
/debug/tickle?name=<tenantname>
Example use:
select crdb_internal.create_tenant(123, 'hello');
.At this point, the secondary tenant server is still not running.
/debug/tickle?name=hello
One can also observe that the controller also serves the name
system
to refer to the system tenant.
For now, the secondary servers created this way use separate network
listeners for SQL/HTTP/RPC. NB: This mechanism will be superseded when
#84604 is addressed.
The port number is assigned randomly. To derive a predictable port
number for testing (until issue #84604 is addressed), the operator can
pass e.g.
--secondary-tenant-port-offset=100
to the startcommand (for 100 past the base port number).