Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) RFC: Enhancing VTGate buffering for MoveTables and Shard by Shard Migration #13464

Closed
rohit-nayak-ps opened this issue Jul 10, 2023 · 1 comment

Comments

@rohit-nayak-ps
Copy link
Contributor

rohit-nayak-ps commented Jul 10, 2023

Enhancing VTGate buffering for MoveTables and Shard by Shard Migration

TL;DR

Today VTGate buffers queries that fail during reparenting and resharding operations aka cluster events. VTGate watches
the topo for these cluster events. When a query fails due to one of these operations, buffering kicks in (with a
timeout). When the operation ends the buffered queries are (re-)executed.

Because these events are at the shard level, the query serving layer where we detect errors, buffer and retry are done
after a query is planned: in particular, the keyspace scope for the query execution is already selected.

MoveTables and Shard By Shard Migrations operations require the query to execute on the (new) target keyspace. So we
need to make two types of changes to support buffering for these operations:

  • For such operations, add buffering at a higher layer so that we can replan.
  • Add new signals as bookends for these operations

Motivation

MoveTables and Shard By Shard Migrations, while triggered by the MoveTables command move tables from one keyspace to
another. However, while the normal MoveTables is a single step where the tables move into all shards of the target
keyspace, for Shard By Shard, we could have the same table served from different keyspaces.

While these workflows are In Progress the queries are served from the source keyspace. Note that both the source and
target VSchemas which have these tables at this point. The ambiguity of which table to route is resolved using
RoutingRules (for MoveTables) and ShardRoutingRules (for Shard By Shard).

When we execute a SwitchWrites (i.e. SwitchTraffic for primaries) we have a small period where the tables are not
routable for new queries due to the process of switching, which is as follows:

  1. Add the tables being moved to a DeniedTables attribute of the source shard's TabletControl object in the topo and
    record the GTID.
  2. This will cause any new DMLs to fail with an "enforce denied tables" error, which currently gets reported to the
    client.
  3. Wait for the workflow to catchup to the recorded GTID.
  4. Update routing rules to point the tables to the target.
  5. VTGates, which are watching the topo changes, see the new routing rules. They then route the queries to the target
    keyspace, where there are no DeniedTables, and hence the query succeeds.

As you see above all queries to the tables being moved will fail and the client needs to wait and retry. This RFC
discusses how we might augment the current buffering mechanism to also support MoveTables and Shard By Shard Migrations.

Detecting Transitions

We need to add two new signals to the existing set of signals that VTGate watches for (using the KeyspaceEventWatcher):

  • MoveTablesSwitchStarted and MoveTablesSwitchCompleted
  • ShardByShardMigrationStarted and ShardByShardMigrationCompleted

There are two sources of start signals:

  • The topo changes that accompany the start and end of switching and
  • Query failures that happen during the transition.

Note that the topo changes need to be seen by the vtgate and vttablet topo watchers. This can lead to inherent races
where queries might get planned for an incorrect version of the vschema, which is exacerbated by plan caches.

MoveTablesSwitchStarted

  • Query error "enforce denied tables"
    Queries will fail at vttablet when they check against the denied table list. The error is returned to vtgate.
  • Existence of DeniedTables in the shard's TabletControl records and the fact that these tables are routed to the
    source keyspace in the SrvVSchema's RoutingTables.

MoveTablesSwitchCompleted

  • Existence of DeniedTables in the shard's TabletControl records and the fact that the table is routed to the target
    keyspace in the SrvVSchema's RoutingTables.

ShardByShardMigrationStarted

  • Query error "enforce denied tables"
    Queries will fail at vttablet when they check against the denied table list. The error is returned to vtgate.
  • Existence of DeniedTables in the shard's TabletControl records and the fact that the failing shard's
    ShardRoutingRules are pointing to the source keyspace

ShardByShardMigrationCompleted

  • Existence of DeniedTables in the shard's TabletControl records and the fact that the failing shard's
    ShardRoutingRules are pointing to the target keyspace

Buffering changes

The signals for detecting the start and end of these events will continue to be in the same layer they are today: the
KeyspaceEventWatcher in vtgate. We add new types of events to the ones we are already detecting.

The "enforce denied tables" error will not be handled by the buffering layer in vtgate's tabletgateway but passed on to
the vtgate Executor where we will recompute the plan.

Note that the error is only passed once buffering stops: at the shard buffer layer where we are buffering, the
re-executed query continues to be routed to the source keyspace resulting in the "enforce denied tables" error.

The errors due to cluster events that are currently handled (reparenting and resharding) are retried in vcursor_impl.
The errors due to the new events will be retried in the vtgate Executor's newExecute(), where we will recompute the
plan.

Assumptions

  • Only one MoveTables operation is in progress at a given time. Currently, this is an implicit expectation within Vitess
    which we should add validations for.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants