Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-tenant import support in Vitess #15403

Closed
rohit-nayak-ps opened this issue Mar 5, 2024 · 2 comments
Closed

Multi-tenant import support in Vitess #15403

rohit-nayak-ps opened this issue Mar 5, 2024 · 2 comments

Comments

@rohit-nayak-ps
Copy link
Contributor

rohit-nayak-ps commented Mar 5, 2024

Feature Description

This issue describes proposed support in Vitess for imports from a multi-tenant database cluster into a single
Vitess cluster.

Motivation

The user has a multi-tenant database architecture. They have several separate MySQL databases, with common schemas, one per tenant. Each database will be imported using one MoveTables workflow, using an external keyspace pointing to the
source, with a common target keyspace. All tables, on source and target, will have a tenant_id column,uniquely
identifying the tenant.

We can run multiple such workflows in parallel, until all tenants are imported. Just like normal imports, we would also
like reverse workflows to be running: to allow for rollback.

Routing rules should, ideally, be set up, so that queries targeting a specific tenant id would be routed correctly to
the target, if the tenant has already been imported, or to the source if the import workflow is in progress but has not
yet been switched to the target.

Many setups with multi-tenancy will use a multi-schema approach, where each tenant has their own MySQL database on a
single server. In this case the queries will use a database qualifier, which matches their database name. So when they
are running the migrations into Vitess, they will continue to use the same database qualifier.

Assumptions

  • The target keyspace has its schema already initialized.
  • Each table in the target keyspace has a tenant id column with the same name and type on both source and target, and it
    should be part of the primary key of each table.
  • If the target keyspace is sharded, the vindex should be a multi-column vindex which includes the tenant id column.

Design

New Keyspace VSchema Attribute

A new keyspace level attribute in the vschema for the keyspace will indicate that this keyspace supports multiple tenants and also the column which identifies the tenant and the column's data type.

Per-TenantId MoveTables

Two new options will be added to MoveTables

  • --tenant-id, example: --tenant-id 452892
  • --source-keyspace-alias, example: --source-keyspace-alias external_tenant_452892

The tenant id corresponds to the tenant of the source keyspace from which data is being migrated. The column name/type are already specified in the keyspace vschema.

The source keyspace alias can be used interchangeably with the keyspace name of the unmanaged source keyspace that points to the cluster from which we are importing.

Tenant-based filtering

An additional filter will be added to the binlogsource filter query for all the tables both for the forward and reverse VReplication workflows.

Tenant-based routing rules

We add a new concept: Keyspace Routing Rules. This will be a new topo object and part of the SrvVSchema. It will be a
list of FromKeyspace to a ToKeyspace.

These will have a higher scope than the current (table-based) Routing Rules. If Keyspace Routing Rules are present, the
table routing rules will not be checked.

Two new vtctldclient commands ApplyKeyspaceRoutingRules and GetKeyspaceRoutingRules.

Initial assumption is that all traffic be switched at the same time. So replica reads and primary reads/writes will need
to be switched together.

Keyspace Routing Rules Example

This is an example of how the keyspace routing rules will look for a tenant migration for tenant id 12345. User
sends qualified queries using the qualifier user_12345. For each tenant we create an unmanaged keyspace called
source_12345. The target keyspace is target. Note that we never reroute the target, so the target is never the source
in any of these routing rules. This is because the target has multiple tenants and we cannot reroute it to any single source.

The MoveTables command will look like:

MoveTables --target-keyspace target Create 
                   --all-tables
                   --tenant-id 12345
                   --workflow import_12345 
                   --source-keyspace source_12345
                   --source-keyspace-alias  user_12345

The associated keyspace routing rules before cutover:

{
  "user_12345": "source_12345",
  "source_12345": "source_12345"
}

The associated keyspace routing rules after cutover:

{
  "user_12345": "target",
  "source_12345": "target"
}
@harshit-gangal
Copy link
Member

I have a few questions

  1. Why do we need source-keyspace and source-keyspace-alias?
  2. Are we still holding the assumption that traffic switch will happen at same time for read and write?

@rohit-nayak-ps
Copy link
Contributor Author

Fixed via issues referenced above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants