(WIP) RFC: Enhancing VTGate buffering for MoveTables and Shard by Shard Migration #13464
Labels
Component: Cluster management
Component: Query Serving
Component: VReplication
Type: RFC
Request For Comment
Enhancing VTGate buffering for MoveTables and Shard by Shard Migration
TL;DR
Today VTGate buffers queries that fail during reparenting and resharding operations aka cluster events. VTGate watches
the topo for these cluster events. When a query fails due to one of these operations, buffering kicks in (with a
timeout). When the operation ends the buffered queries are (re-)executed.
Because these events are at the shard level, the query serving layer where we detect errors, buffer and retry are done
after a query is planned: in particular, the keyspace scope for the query execution is already selected.
MoveTables and Shard By Shard Migrations operations require the query to execute on the (new) target keyspace. So we
need to make two types of changes to support buffering for these operations:
Motivation
MoveTables and Shard By Shard Migrations, while triggered by the MoveTables command move tables from one keyspace to
another. However, while the normal MoveTables is a single step where the tables move into all shards of the target
keyspace, for Shard By Shard, we could have the same table served from different keyspaces.
While these workflows are In Progress the queries are served from the source keyspace. Note that both the source and
target VSchemas which have these tables at this point. The ambiguity of which table to route is resolved using
RoutingRules (for MoveTables) and ShardRoutingRules (for Shard By Shard).
When we execute a SwitchWrites (i.e. SwitchTraffic for primaries) we have a small period where the tables are not
routable for new queries due to the process of switching, which is as follows:
record the GTID.
client.
keyspace, where there are no DeniedTables, and hence the query succeeds.
As you see above all queries to the tables being moved will fail and the client needs to wait and retry. This RFC
discusses how we might augment the current buffering mechanism to also support MoveTables and Shard By Shard Migrations.
Detecting Transitions
We need to add two new signals to the existing set of signals that VTGate watches for (using the KeyspaceEventWatcher):
There are two sources of start signals:
Note that the topo changes need to be seen by the vtgate and vttablet topo watchers. This can lead to inherent races
where queries might get planned for an incorrect version of the vschema, which is exacerbated by plan caches.
MoveTablesSwitchStarted
Queries will fail at vttablet when they check against the denied table list. The error is returned to vtgate.
source keyspace in the SrvVSchema's RoutingTables.
MoveTablesSwitchCompleted
keyspace in the SrvVSchema's RoutingTables.
ShardByShardMigrationStarted
Queries will fail at vttablet when they check against the denied table list. The error is returned to vtgate.
ShardRoutingRules are pointing to the source keyspace
ShardByShardMigrationCompleted
ShardRoutingRules are pointing to the target keyspace
Buffering changes
The signals for detecting the start and end of these events will continue to be in the same layer they are today: the
KeyspaceEventWatcher in vtgate. We add new types of events to the ones we are already detecting.
The "enforce denied tables" error will not be handled by the buffering layer in vtgate's tabletgateway but passed on to
the vtgate Executor where we will recompute the plan.
Note that the error is only passed once buffering stops: at the shard buffer layer where we are buffering, the
re-executed query continues to be routed to the source keyspace resulting in the "enforce denied tables" error.
The errors due to cluster events that are currently handled (reparenting and resharding) are retried in vcursor_impl.
The errors due to the new events will be retried in the vtgate Executor's newExecute(), where we will recompute the
plan.
Assumptions
which we should add validations for.
The text was updated successfully, but these errors were encountered: