Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: Multi-tenant import support in Vitess #15128

Closed

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Feb 4, 2024

Description

While importing from a multi-tenant database, using MoveTables, it is possible one wants to only import some of the tenants. This PR allows the user to specify an additional filter like tenant_id=1. This filter is applied to both the forward and reverse workflows.

It also adds keyspace routing rules, which allow entire keyspaces to be routed to the source or target keyspace for MoveTables workflows, depending on whether are in progress or switched, respectively.

For applications that use a database qualifier for their per-tenant queries and the qualifiers do not match the source keyspace defined in Vitess, an option is added to also route this database to the same route pointed to by the source keyspace.

MoveTables changes

There are three new options to MoveTables

  • --use-keyspace-routing-rules : uses keyspace routing rules instead of table routing rules
  • --source-keyspace-elias : also add provided name as alias while mapping the source keyspace to the source or target keyspace, depending on whether traffic has been switched or not
  • --additional-filter: adds the provided sql where clause predicate to the forward and reverse workflow binlogsource filters.

Related changes

Frozen workflows

We now allow creating new workflows even if there is a frozen workflow on the target.

Frozen workflows are those which have switched writes. They have the message column of the _vt.vreplication row set to frozen. This is used for various validations at the present like not starting vdiffs for it or creating a new movetables/reshard workflow if there are such workflows.

This validation and mechanism is a legacy hack to prevent workflows which should not restart, to be restarted by mistake by vdiffs and reshards. The issue discusses this and suggests improvements for improving this.

The frozen workflows will be deleted on a Complete. But in our case we may have workflows with switched traffic in play because, for large number of tenants, we will expect there to be several concurrent workflows running in different stages. Hence the need to remove this validation.

This relaxation doesn't affect the current validation that ensures vdiffs will not start for frozen workflows. The above issue will review and fix any gaps that arise from the removal of this validation in the current Vitess version.

TBD:

  • Unit tests
  • Improve e2e test
  • Test reverse traffic

Related Issue(s)

#15403

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Copy link
Contributor

vitess-bot bot commented Feb 4, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Feb 4, 2024
@rohit-nayak-ps rohit-nayak-ps added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication labels Feb 4, 2024
@github-actions github-actions bot added this to the v19.0.0 milestone Feb 4, 2024
@rohit-nayak-ps rohit-nayak-ps removed the NeedsBackportReason If backport labels have been applied to a PR, a justification is required label Feb 4, 2024
@frouioui frouioui modified the milestones: v19.0.0, v20.0.0 Feb 6, 2024
Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: Patch coverage is 41.32653% with 230 lines in your changes are missing coverage. Please review.

Project coverage is 65.61%. Comparing base (7a2af6f) to head (6ab42cc).
Report is 9 commits behind head on main.

Files Patch % Lines
...cmd/vtctldclient/command/keyspace_routing_rules.go 13.33% 52 Missing ⚠️
go/vt/vtctl/workflow/server.go 56.07% 47 Missing ⚠️
go/vt/vtctl/workflow/traffic_switcher.go 0.00% 33 Missing ⚠️
go/vt/vtctl/grpcvtctldserver/server.go 0.00% 23 Missing ⚠️
...ldclient/command/vreplication/movetables/create.go 0.00% 18 Missing ⚠️
go/vt/vtctl/workflow/utils.go 60.52% 15 Missing ⚠️
go/vt/vtctl/workflow/materializer.go 58.33% 10 Missing ⚠️
go/vt/vtgate/vindexes/vschema.go 41.17% 10 Missing ⚠️
go/vt/vtctl/grpcvtctldclient/client_gen.go 0.00% 8 Missing ⚠️
go/vt/topo/vschema.go 71.42% 6 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15128      +/-   ##
==========================================
+ Coverage   65.44%   65.61%   +0.17%     
==========================================
  Files        1562     1563       +1     
  Lines      193922   194239     +317     
==========================================
+ Hits       126905   127448     +543     
+ Misses      67017    66791     -226     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rohit-nayak-ps rohit-nayak-ps changed the title MoveTables: add additional filter query Multi-tenant import support in Vitess Feb 11, 2024
@rohit-nayak-ps rohit-nayak-ps changed the title Multi-tenant import support in Vitess Experimental: WIP: Multi-tenant import support in Vitess Feb 21, 2024
@rohit-nayak-ps rohit-nayak-ps changed the title Experimental: WIP: Multi-tenant import support in Vitess Experimental. WIP: Multi-tenant import support in Vitess Feb 21, 2024
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…action for them. Add vtctldclient cli commands for Apply/Get KeyspaceRoutingRules

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…eating the workflow and switching traffic.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…les.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…ate to tests.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…steps

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic functionality looks solid. My comments are mostly around CLI, testing and documentation. The test also looks solid, but we probably need to run it with a slightly larger dataset.


var (
// ApplyKeyspaceRoutingRules makes an ApplyKeyspaceRoutingRules gRPC call to a vtctld.
ApplyKeyspaceRoutingRules = &cobra.Command{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about extending ApplyRoutingRules and GetRoutingRules to take optional keyspace and shard arguments? Then we won't have so many separate commands. We can then also deprecate and delete ApplyShardRoutingRules and GetShardRoutingRules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Keyspace, Routing and Shard rules are quite different in terms of functionality. It might be better to have them separate because of that.

We can't have a table routing rule just for a shard for example. Also each of these takes a rules file and those files cannot contain a combination of table and keyspace for example.

Maybe we can have think of a RoutingRule umbrella command: table, keyspace and shardsubcommands and apply andget actions. But it will be a non-trivial refactor.

@@ -47,6 +47,10 @@ func registerCommands(root *cobra.Command) {
create.Flags().StringSliceVar(&createOptions.ExcludeTables, "exclude-tables", nil, "Source tables to exclude from copying.")
create.Flags().BoolVar(&createOptions.NoRoutingRules, "no-routing-rules", false, "(Advanced) Do not create routing rules while creating the workflow. See the reference documentation for limitations if you use this flag.")
create.Flags().BoolVar(&createOptions.AtomicCopy, "atomic-copy", false, "(EXPERIMENTAL) A single copy phase is run for all tables from the source. Use this, for example, if your source keyspace has tables which use foreign key constraints.")
create.Flags().StringVar(&createOptions.VReplicationWorkflowOptions.AdditionalFilter, "additional-filter", "", "Additional filter to apply to the tables being copied in addition to the default filter.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to call this additional-filter? Why can't it just be filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with -filter and then realized that we also use filter in the _vt.vreplication row and we will not be using the value provided for this option as the filter but as an additional clause.

But I am fine if we want to just use -filter.

@@ -47,6 +47,10 @@ func registerCommands(root *cobra.Command) {
create.Flags().StringSliceVar(&createOptions.ExcludeTables, "exclude-tables", nil, "Source tables to exclude from copying.")
create.Flags().BoolVar(&createOptions.NoRoutingRules, "no-routing-rules", false, "(Advanced) Do not create routing rules while creating the workflow. See the reference documentation for limitations if you use this flag.")
create.Flags().BoolVar(&createOptions.AtomicCopy, "atomic-copy", false, "(EXPERIMENTAL) A single copy phase is run for all tables from the source. Use this, for example, if your source keyspace has tables which use foreign key constraints.")
create.Flags().StringVar(&createOptions.VReplicationWorkflowOptions.AdditionalFilter, "additional-filter", "", "Additional filter to apply to the tables being copied in addition to the default filter.")
create.Flags().BoolVar(&createOptions.VReplicationWorkflowOptions.UseKeyspaceRoutingRules, "use-keyspace-routing-rules", false, "Use keyspaces routing rules to route traffic to the target keyspace. This is not compatible with the --no-routing-rules flag and will override the default table routing rules mechanism.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create.Flags().BoolVar(&createOptions.VReplicationWorkflowOptions.UseKeyspaceRoutingRules, "use-keyspace-routing-rules", false, "Use keyspaces routing rules to route traffic to the target keyspace. This is not compatible with the --no-routing-rules flag and will override the default table routing rules mechanism.")
create.Flags().BoolVar(&createOptions.VReplicationWorkflowOptions.UseKeyspaceRoutingRules, "use-keyspace-routing-rules", false, "Use keyspace routing rules to route traffic to the target keyspace. This is not compatible with the --no-routing-rules flag and will override the default table routing rules mechanism.")

@@ -168,6 +168,7 @@ func waitForQueryResult(t *testing.T, conn *mysql.Conn, database string, query s
for {
qr := execVtgateQuery(t, conn, database, query)
require.NotNil(t, qr)
log.Infof("query %q returned %v", query, qr.Rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to keep this log line for merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

const waitTimeout = 10 * time.Minute

func TestMultiTenant(t *testing.T) {
setSidecarDBName("_vt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really required? or just making it explicit that we are using the default sidecar DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is currently __vt_e2e-test which is very painful to type in the mysql cli, so I have been using _vt for new tests. I guess we should just use the complex value for some tests and _vt for the rest. Will do it in my next e2e refactor.

defer func() {
defaultRdonly = origDefaultRdonly
}()
defaultRdonly = 0
Copy link
Member

@deepthi deepthi Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here will be useful. I found the package level var definition and understood that it means we don't run any RDONLY tablets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}

func (mtm *multiTenantMigration) initTenantData(t *testing.T, tenantId int64, userKeyspace string) {
mtm.insertSomeData(t, tenantId, userKeyspace, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 30 tenants, and table has 10 rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently 30 tenants. Test nitially inserts 10 rows and then a few more additionally. Changed all literals to use constants.

@@ -38,6 +38,7 @@ CREATE TABLE IF NOT EXISTS vreplication
`component_throttled` varchar(255) NOT NULL DEFAULT '',
`workflow_sub_type` int NOT NULL DEFAULT '0',
`defer_secondary_keys` tinyint(1) NOT NULL DEFAULT '0',
`options` json NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably start adding sql comments when we add new columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added. I plan to create a documentation PR soon which adds comments on all tables for existing columns as well in the sidecar schema.

@@ -112,11 +113,11 @@ func validateNewWorkflow(ctx context.Context, ts *topo.Server, tmc tmclient.Tabl
}{{
fmt.Sprintf("select 1 from _vt.vreplication where db_name=%s and workflow=%s", encodeString(primary.DbName()), encodeString(workflow)),
fmt.Sprintf("workflow %s already exists in keyspace %s on tablet %v", workflow, keyspace, primary.Alias),
}, {
}, /* { // temporarily comenting this, since it doesn't allow to create a new workflow if there is a frozen workflow. Need to check if we really need this validation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reverted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this check. Added details in Description.

…ing rules will not be large

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…uncExpr required a change in a unit test helper. Uncomment frozen workflow code for now so that unit tests run.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…n workflows

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…ime in CI. Let it run in its own workflow for now to isolate any issues and allow running it several times

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps changed the title Experimental. WIP: Multi-tenant import support in Vitess Experimental: Multi-tenant import support in Vitess Mar 5, 2024
@rohit-nayak-ps rohit-nayak-ps removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Mar 5, 2024
…dd ReverseTraffic to e2e tests. Some refactor of e2e test.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps
Copy link
Contributor Author

Replaced with #15503

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants