You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.
Make DM become HA, which includes making it tolerant to process crashes, machine failures, network partitioning and more.
Combine DM-master into DM-worker as one binary file.
Problem Statement
The current set of DM cluster has only one DM-master process. When the DM-master process is abnormal, task management, cluster management, and shard DDL coordination can no longer be performed.
The current set of DM cluster has only one DM-worker process for each upstream MySQL/MariaDB instance. When the DM-worker process is abnormal, data migration related to the upstream MySQL/MariaDB instance is interrupted.
For some metadata (task configuration, cluster version, etc.), there is only one copy of data locally on the DM-worker node. When the local data copy is damaged, the data migration task is difficult to recover.
Proposed Solution
Combine DM-master into DM-worker to reduce cluster components (Let's still call it DM-worker below).
Embed etcd into each DM-worker process as a cluster for storing metadata and leader election.
Elect a leader for DM-worker instances based on the etcd cluster to handle user requests (from dmctl or other HTTP clients, just like the previous DM-master did). If the original leader fails, a new leader is automatically elected. Note, the entire DM cluster has at most one leader at a time (Let's call it leader below).
Support deploying one or more DM-worker instances for each upstream MySQL/MariaDB instance and electing an instance to run related data migration subtasks. If the original instance fails, a new instance is automatically elected to run the data migration subtasks. Note, for each upstream MySQL/MariaDB instance, at most one DM-worker instance is running the data migration subtasks at the same time. One example:
for MySQL-A, two DM-workers (DM-worker-1 and DM-worker-2) exist. Then we need to elect DM-worker-1 or DM-worker-2 to run the subtasks, and we mark the elected DM-worker as the running instance and others as idle instances.
for MySQL-B, only one DM-worker (DM-worker-3) exists. Then we elect DM-worker-3 to run the subtasks.
Success Criteria
If multiple DM-worker instances are deployed, user requests can still be handled when no more than half of the instances are abnormal.
As long as more than half of the DM-worker instances are available, metadata reads and writes are not affected.
As long as there are one or more available DM-worker instances corresponding to an upstream MySQL/MariaDB instance, the data migration associated with that upstream is not affected.
Deploying only one DM-worker instance can still handle data migration tasks when there is only one upstream MySQL/MariaDB instance.
@csuzhangxc@nolouch you can link some discussed conclusion documents here, don't be afraid that the document is in Chinese, maybe some companies or communities are interested in it
@csuzhangxc@nolouch you can link some discussed conclusion documents here, don't be afraid that the document is in Chinese, maybe some companies or communities are interested in it
@GregoryIan I've linked the document in the Design section above and updated the TODO list too.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Overview
Problem Statement
Proposed Solution
Success Criteria
Difficulty
Score
Mentor(s)
TODO list
Build an API-server or operation queue based on the etcd cluster, partially done in master: use etcd as operate queue #363.Elect a DM-worker to run upstream related data migration subtasks, partially done in dm-worker: support election with same source-id #399.Register DM-worker dynamically for specified upstream, partially done in *: register DM-worker instance to DM-master dynamically #394.Refresh lease when running subtask for DM-worker.Design
References
The text was updated successfully, but these errors were encountered: