Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: enabling instantiating secondary tenant servers in-memory #84608

Merged
merged 2 commits into from
Nov 28, 2022

Conversation

knz
Copy link
Contributor

@knz knz commented Jul 18, 2022

Epic: CRDB-14537
Fixes #84604.

This PR lays the groundwork for the ability to run tenant servers
in-memory. It introduces a new serverController with two roles:

  • it maps tenant names to server instances.
  • it only starts servers the first time the tenant name is referenced,
    and only if that tenant is marked active.

As of this PR, no subsystem inside CockroachDB refers to this
new server controller; this means there is no signal hooked up
to start these servers automatically yet.

For testing, a debug HTTP endpoint has been added:
/debug/tickle?name=<tenantname>

Example use:

  1. start a server. At this point no secondary tenant server is created yet.
  2. create a test tenant, e.g. via select crdb_internal.create_tenant(123, 'hello');.
    At this point, the secondary tenant server is still not running.
  3. Perform a HTTP request to the debug endpoint, e.g. to /debug/tickle?name=hello
  4. Observe (e.g. in logs): the secondary server has been started.

One can also observe that the controller also serves the name system
to refer to the system tenant.

For now, the secondary servers created this way use separate network
listeners for SQL/HTTP/RPC. NB: This mechanism will be superseded when
#84604 is addressed.

The port number is assigned randomly. To derive a predictable port
number for testing (until issue #84604 is addressed), the operator can
pass e.g. --secondary-tenant-port-offset=100 to the start
command (for 100 past the base port number).

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Jul 18, 2022

Submitting draft PR to see what CI has to say.

@knz
Copy link
Contributor Author

knz commented Jul 19, 2022

CI is happy overall.

Next steps:

  • some simple tests
  • RFC to explain the approach, and indicate what work remains after this
  • review (we should feel comfortable merging this as basic infra even if there's improvements left )

@knz
Copy link
Contributor Author

knz commented Nov 1, 2022

Rebased; this is working again now.

@knz knz force-pushed the 20220718-tenants branch 2 times, most recently from 833ce28 to 707210d Compare November 1, 2022 13:41
@knz knz changed the title cli,server: start a secondary tenant in-memory server: enabling instantiating secondary tenant servers in-memory Nov 4, 2022
@knz knz force-pushed the 20220718-tenants branch 2 times, most recently from b26c74a to 55a2a6a Compare November 4, 2022 14:42
@knz
Copy link
Contributor Author

knz commented Nov 4, 2022

@andreimatei would you be interested to participate in reviewing this code?

@knz knz force-pushed the 20220718-tenants branch 3 times, most recently from 217ea88 to 3a64362 Compare November 7, 2022 12:54
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"interested" is not exactly the word I'd use, but I'm happy to do it if you don't have a better reviewer (which you may well not have).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@knz
Copy link
Contributor Author

knz commented Nov 7, 2022

"interested" is not exactly the word I'd use

tbh I had thought you'd be interested because you did ask a lot of questions and had some valuable suggestions.

But I was planning to get a reviewer from the MT v-team anyway. I'll touch base with you when we're a bit further in the process so you get a peek of this work in a more advanced state.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes #84604.
Maybe not "Fixes"? Seems like there's follow-up here.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/server/server_controller.go line 257 at r5 (raw file):

	if !cluster.TelemetryOptOut() {
		tenantServer.StartDiagnostics(startCtx)
	}

is it not possible for the tenantServer to own its own StartDiagnostics call?


pkg/server/server_controller.go line 287 at r5 (raw file):

	}

	tr := tracing.NewTracerWithOpt(ctx, tracing.WithClusterSettings(&st.SV))

Question about settings: when do they get populated with their saved values from SQL?


pkg/server/server_controller.go line 310 at r5 (raw file):

		return baseCfg, sqlCfg, errors.Wrap(err, "cannot create store spec")
	}
	baseCfg = MakeBaseConfig(st, tr, storeSpec)

Is there a single storeDir for all tenants? I'm not 100% sure how this part works.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not "Fixes"? Seems like there's follow-up here.

This PR fixes this specific issue. All the followup work is tracked in #91282.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)


pkg/server/server_controller.go line 257 at r5 (raw file):

Previously, dhartunian (David Hartunian) wrote…

is it not possible for the tenantServer to own its own StartDiagnostics call?

Ok - added a separate commit to do that.


pkg/server/server_controller.go line 287 at r5 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Question about settings: when do they get populated with their saved values from SQL?

NB: WithClusterSettings merely attaches the SV object to the tracer, it doesn't read/copy setting values.

Regarding your question: after s.settingsWatcher.Start and s.systemConfigWatcher.Start have been called, in sqlServer.preStart().


pkg/server/server_controller.go line 310 at r5 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Is there a single storeDir for all tenants? I'm not 100% sure how this part works.

Woops, you're right.
We have logic in standalone SQL servers (in cli/flags.go) to generate distinct store directories. Added here too.

@knz knz marked this pull request as ready for review November 11, 2022 11:15
@knz knz requested review from a team as code owners November 11, 2022 11:15
@knz knz requested a review from a team November 11, 2022 11:15
@knz knz requested review from a team as code owners November 11, 2022 11:15
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some drive-by comments

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @knz, @smg260, and @srosenberg)


pkg/server/server_controller.go line 112 at r9 (raw file):

}

func (c *serverController) get(ctx context.Context, tenantName string) (onDemandServer, error) {

I think this deserves a comment saying restating that the tenant is instantiated if it doesn't exist. I'd also name it getOrCreateTenant or such.


pkg/server/server_controller.go line 122 at r9 (raw file):

	c.mu.nextServerIdx++
	idx := c.mu.nextServerIdx
	s, err := c.newServerFn(ctx, tenantName, idx, func() {

I'd make newServerForTenant document a specific error type returned when the tenant doesn't exist, and I'd carry forward that documentation to newServerFn; besides the error deserving a type, the docs will implicitly suggest what happens when the tenant doesn't exist.


pkg/server/server_controller.go line 123 at r9 (raw file):

	idx := c.mu.nextServerIdx
	s, err := c.newServerFn(ctx, tenantName, idx, func() {
		c.mu.Lock()

consider extracting this in a method called deregisterTenant or so. It took me a while to see what this argument represents and how come this call doesn't deadlock.


pkg/server/server_controller.go line 142 at r9 (raw file):

	// under lock and stop them without lock, so that their deregister
	// function can operate on the map.
	servers := func() (res []onDemandServer) {

I think this inline functions only makes the code harder to read; there's a single return from the inner func


pkg/server/server_controller.go line 186 at r9 (raw file):

	// Start the tenant server.
	tenantStopper, tenantServer, err := s.startInMemoryTenantServerInternal(ctx, tenantID, index)

consider logging something around here


pkg/server/server_controller.go line 198 at r9 (raw file):

// tenantServerWrapper implements the onDemandServer interface for SQLServerWrapper.
//
// TODO(knz): Evaluate if the two can be merged together.

whatever this is referring to, I think this is the time


pkg/server/server_controller.go line 215 at r9 (raw file):

// systemServerWrapper implements the onDemandServer interface for Server.
//
// TODO(knz): Evaluate if this can be eliminated.

I think this is the time. I'd definitely rather we don't have this struct.


pkg/server/server_controller.go line 223 at r9 (raw file):

func (s *systemServerWrapper) stop(ctx context.Context) {
	// Do nothing.

nit: this comment adds "nothing". But is calling this unexpected / should this panic ?


pkg/server/server_controller.go line 290 at r9 (raw file):

	ctx context.Context,
	tenantID roachpb.TenantID,
	index int,

the index probably deserves a comment


pkg/server/server_controller.go line 293 at r9 (raw file):

	kvServerCfg Config,
	stopper *stop.Stopper,
) (baseCfg BaseConfig, sqlCfg SQLConfig, err error) {

nit: why use named returns here? Their names are not useful as documentation.


pkg/server/server_controller.go line 297 at r9 (raw file):

	// This version initialization is copied from cli/mt_start_sql.go.
	// TODO(knz): Why is this even useful? The comment refers to v21.1 compatibility,

you don't wrap bro

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @dhartunian, @smg260, and @srosenberg)


pkg/server/server_controller.go line 112 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this deserves a comment saying restating that the tenant is instantiated if it doesn't exist. I'd also name it getOrCreateTenant or such.

Done.


pkg/server/server_controller.go line 122 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'd make newServerForTenant document a specific error type returned when the tenant doesn't exist, and I'd carry forward that documentation to newServerFn; besides the error deserving a type, the docs will implicitly suggest what happens when the tenant doesn't exist.

Done.


pkg/server/server_controller.go line 123 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider extracting this in a method called deregisterTenant or so. It took me a while to see what this argument represents and how come this call doesn't deadlock.

Clarified.


pkg/server/server_controller.go line 142 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this inline functions only makes the code harder to read; there's a single return from the inner func

Done.


pkg/server/server_controller.go line 186 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider logging something around here

Done.


pkg/server/server_controller.go line 198 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

whatever this is referring to, I think this is the time

Removed.


pkg/server/server_controller.go line 215 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this is the time. I'd definitely rather we don't have this struct.

Removed.


pkg/server/server_controller.go line 223 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: this comment adds "nothing". But is calling this unexpected / should this panic ?

Removed the comment, but clarified the purpose of the type.


pkg/server/server_controller.go line 290 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

the index probably deserves a comment

Done.


pkg/server/server_controller.go line 293 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: why use named returns here? Their names are not useful as documentation.

Because return BaseConfig{}, SQLConfig{}, err in the early error path is less readable.


pkg/server/server_controller.go line 297 at r9 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

you don't wrap bro

Done.

@ajstorm ajstorm added the A-multitenancy Related to multi-tenancy label Nov 23, 2022
Epic: CRDB-14537

This commit lays the groundwork for the ability to run tenant servers
in-memory. It introduces a new `serverController` with two roles:

- it maps tenant names to server instances.
- it only starts servers the first time the tenant name is referenced,
  and only if that tenant is marked active.

As of this commit, no subsystem inside CockroachDB refers to this
new server controller; this means there is no signal hooked up
to start these servers automatically yet.

For testing, a debug HTTP endpoint has been added:
`/debug/tickle?name=<tenantname>`

Example use:

1. start a server. At this point no secondary tenant server is created yet.
2. create a test tenant, e.g. via `select crdb_internal.create_tenant(123, 'hello');`.
   At this point, the secondary tenant server is still not running.
3. Perform a HTTP request to the debug endpoint, e.g. to `/debug/tickle?name=hello`
4. Observe (e.g. in logs): the secondary server has been started.

One can also observe that the controller also serves the name `system`
to refer to the system tenant.

For now, the secondary servers created this way use separate network
listeners for SQL/HTTP/RPC. NB: This mechanism will be superseded when
cockroachdb#84604 is addressed.

The port number is assigned randomly. To derive a predictable port
number for testing (until issue cockroachdb#84604 is addressed), the operator can
pass e.g. `--secondary-tenant-port-offset=100` to the start
command (for 100 past the base port number).

Release note: None
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thx. no more comments from me.

Reviewed 5 of 19 files at r3, 4 of 14 files at r4, 1 of 15 files at r9, 14 of 14 files at r11, 12 of 12 files at r12, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @smg260, and @srosenberg)

@knz
Copy link
Contributor Author

knz commented Nov 28, 2022

thanks!

bors r=dhartunian,andreimatei

@craig
Copy link
Contributor

craig bot commented Nov 28, 2022

Build failed (retrying...):

craig bot pushed a commit that referenced this pull request Nov 28, 2022
91744: server: use a shared HTTP listener for all in-memory tenant servers r=andreimatei,dhartunian a=knz

Parent PRs:
- [ ] #84608
- [ ] #91739

Fixes #91738.

This commit introduces a HTTP (de)multiplexer for all in-memory tenant
servers.

By default, HTTP requests are routed to the system tenant server.
This can be overridden with the header `X-Cockroach-Tenant` or the `tenant` cookie.

Epic: CRDB-14487

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@craig craig bot merged commit 65c7eb0 into cockroachdb:master Nov 28, 2022
@knz knz deleted the 20220718-tenants branch November 28, 2022 19:44
craig bot pushed a commit that referenced this pull request Nov 29, 2022
92574: pgwire: unexport ServerMetrics, rename to tenantSpecificMetrics r=stevendanna a=knz

Parent PRs:
- [x] #84608
- [x] #91739
- [x] #91744

Epic: CRDB-14537

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
craig bot pushed a commit that referenced this pull request Jan 9, 2023
92580: server,pgwire: use a single SQL listener for multiple tenants r=rafiss a=knz

Parent PRs:
- [x] #84608
- [x] #91739
- [x] #91744
- [x] #92574
- [x] #92575
- [x] #92576
- [x] #92577
- [x] #92578
- [x] #92579
- [ ] #94901

Fixes #84585.
Informs  #94310.
Epic: CRDB-14537

This commit implements tenant routing using a single SQL network listener.

The tenant name can be specified:

- via the client status param `options`, e.g. `options=-ccrdb:tenant=hello`
- via the database name, e.g. `crdb:tenant-hello.defaultdb`.

Release note (backward-incompatible change): If a SQL database is created with a name that starts with `crdb:tenant-` (e.g. `CREATE DATABASE "crdb:tenant-foo"`, clients will no longer be able to connect to it directly.

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
craig bot pushed a commit that referenced this pull request Aug 25, 2023
84700: rfcs: new RFC on shared-process deployments for cluster virtualization r=stevendanna a=knz

Text of RFC: https://github.com/knz/cockroach/blob/20220720-rfc-in-memory-tenants/docs/RFCS/20220720_shared_process_deployments.md

Some prototype work ongoing here: #84608.
Epic: CRDB-14537

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: start in-memory secondary tenant dynamically [CRDB-14537 followup]
5 participants