Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: update user guide (#2098) #2147

Merged
merged 3 commits into from
Apr 2, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 116 additions & 111 deletions reference/tools/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,184 +6,189 @@ aliases: ['/docs/v3.1/how-to/migrate/from-mysql/','/docs/v3.1/how-to/migrate/inc

# TiDB Ecosystem Tools User Guide

Currently, TiDB has multiple ecosystem tools. Some of them have overlapping functionality, and some are different versions of the same tool. This document introduces each of these tools, illustrates their relationship, and describes when to use which tool for each TiDB version.
The TiDB ecosystem has a wealth of tools for data migration, backup & restore for users with different use cases to choose from.

## TiDB ecosystem tools overview
- Some of the functionalities of these tools might overlap. For example, TiDB Loader, TiDB Lightning and TiDB DM can all do full data loading.
- Some of the tools might have evolved. For example, TiDB Binlog will be evolved to CDC (Change Data Capture).
- Some of the tools are designed to support specific TiDB versions and the others might be deprecated as user requirements change.

TiDB ecosystem tools can be divided into:
This guide is specifically designed to help you better understand these tools and therefore make an informed decision while choosing these tools to support your business.

- Data import tools, including full import tools, backup and restore tools, incremental import tools, and so forth.
- Data export tools, including full export tools. incremental export tools, and so forth.
## Data import (restore or data replication)

The two types of tools are discussed in detail below.
### Full data import tools

### Data import tools
#### TiDB Lightning

#### Full import tool TiDB Lightning
[TiDB Lightning](/reference/tools/tidb-lightning/overview.md) (Lightning) is a tool used for the fast full import of large amounts of data into a TiDB cluster. Currently, TiDB Lightning supports reading SQL dump exported via Mydumper or CSV data source.

[TiDB Lightning](/reference/tools/tidb-lightning/overview.md) is a tool used for fast full import of data into a TiDB cluster.
TiDB Lightning supports two back ends: "Importer" and "TiDB". It determines how tidb-lightning delivers data into the target cluster. The two back ends are as follows:

> **Note:**
>
> When you import data into TiDB using TiDB Lightning, there are two modes:
>
> - The default mode: Use `tikv-importer` as the backend. In this mode, the cluster can not provide normal services during the data import process. It is used when you import large amounts (TBs) of data.
> - The second mode: Use `TiDB` as the backend (similar to Loader). The import speed is slower than that in the default mode. However, the second mode supports online import.
1. The default one is [`Importer` back end](/reference/tools/tidb-lightning/overview.md). When using `Importer` as the back end, the cluster cannot provide normal services during the import process. It is used for a large amount of data importing (TB).
2. The second one is [`TiDB` back end](/reference/tools/tidb-lightning/tidb-backend.md) (just work as [Loader](#tidb-loader-to-be-deprecated)). It is much slower than `Importer` back end model. But the cluster could serve the application during the import process. It is used to handle tens/hundreds of GB data.

The following are the basics of TiDB Lightning:

- Input:
- Files output by Mydumper;
- CSV files.
- Compatibility: Compatible with TiDB v2.1 and later versions.
- Kubernetes: Supported. See [Quickly restore data into a TiDB cluster in Kubernetes using TiDB Lightning](https://pingcap.com/docs/tidb-in-kubernetes/stable/restore-data-using-tidb-lightning/).
- Input data source:
- The output file of Mydumper
- CSV file
- Supported TiDB versions: v2.1 or later
- Kubernetes support: Yes. See [Quickly restore data into a TiDB cluster in Kubernetes using TiDB Lightning](https://pingcap.com/docs/tidb-in-kubernetes/stable/restore-data-using-tidb-lightning/) for details.

#### Backup and restore tool BR
#### BR (beta)

[BR](/reference/tools/br/br.md) is a command-line tool used for distributed data backup and restoration for a TiDB cluster. Compared with Mydumper and Loader, BR allows you to finish backup and restore tasks with greater efficiency in scenarios of huge data volume.
[BR](/reference/tools/br/br.md) (Backup & Restore) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with Mydumper/Loader/Lightning, BR is more suitable for scenarios of huge data volume.

The following are the basics of BR:

- [Types of backup files](/reference/tools/br/br.md#types-of-backup-files): The SST file and the `backupmeta` file.
- Compatibility: Compatible with TiDB v3.1 and v4.0 versions.
- Kubernetes: Supported. Relevant documents are on the way.

#### Incremental and full import tool TiDB Data Migration

[TiDB Data Migration (DM)](https://pingcap.com/docs/tidb-data-migration/stable/overview/) is an tool used for data migration from MySQL/MariaDB into TiDB. It supports both the full and incremental data replication.

The following are the basics of DM:
- Input data source: The output file of BR
- Supported TiDB versions: v3.1 or later
- Kubernetes support: Yes. The document is WIP.

- Input: Full data and binlog data of MySQL/MariaDB.
- Output: SQL statements written to TiDB.
- Compatibility: Compatible with all TiDB versions.
- Kubernetes: In development.
#### TiDB Loader (to be deprecated)

#### Full import tool Loader (Stop maintenance, not recommended)
> **Note:**
>
> TiDB Loader is to be deprecated and replaced with [Lightning](/reference/tools/tidb-lightning/tidb-backend.md#migrating-from-loader-to-tidb-lightning-tidb-back-end).

[Loader](/reference/tools/loader.md) is a lightweight full data import tool. Data is imported into TiDB in the form of SQL statements. Currently, this tool is gradually replaced by [TiDB Lightning](#full-import-tool-tidb-lightning), see [TiDB Lightning TiDB-backend Document](/reference/tools/tidb-lightning/tidb-backend.md#migrating-from-loader-to-tidb-lightning-tidb-backend).
[TiDB Loader](/reference/tools/loader.md) is a lightweight full-data importing tool for TiDB. It reads the output file of Mydumper and loads the data into TiDB.

The following are the basics of Loader:

- Input: Files output by Mydumper.
- Output: SQL statements written to TiDB.
- Compatibility: Compatible with all TiDB versions.
- Kubernetes: Supported. See [Backup and restore](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-and-restore-using-helm-charts/).

#### Incremental import tool Syncer (Stop maintenance, not recommended)
- Input data source: Mydumper’s output file
- Supported TiDB versions: all versions
- Kubernetes support: Yes. See [Backup and restore](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-and-restore-using-helm-charts/) for details.

[Syncer](/reference/tools/syncer.md) is a tool used for incremental import of real-time binlog data from MySQL/MariaDB into TiDB. It is recommended to use [TiDB Data Migration](#Incremental-import-tool-tidb-data-migration) to replace Syncer.

The following are the basics of Syncer:
### Incremental data import tools

- Input: Binlog data of MySQL/MariaDB.
- Output: SQL statements written to TiDB.
- Compatibility: Compatible with all TiDB versions.
- Kubernetes: Not supported.
#### Syncer (deprecated)

### Data export tools
[Syncer](/reference/tools/syncer.md) is a tool used to import data incrementally. It acts as a MySQL slave to read binlog from MySQL/MariaDB master and replicate the binlog to the downstream. It is recommended to use [TiDB Data Migration](#tidb-data-migration) to replace Syncer.

#### Full export tool Mydumper

[Mydumper](/reference/tools/mydumper.md) is a MySQL community tool used for full logical backups of MySQL that also works with TiDB.
The following are the basics of Syncer:

The following are the basics of Mydumper:
- Input data source: MySQL/MariaDB binlog service
- Supported TiDB versions: all versions
- Kubernetes support: No

- Input: MySQL/TiDB clusters.
- Output: SQL files.
- Compatibility: Compatible with all TiDB versions.
- Kubernetes: Supported. See [Backup and Restore](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-and-restore-using-helm-charts/).
### Full and incremental data import tools

#### Full export tool TiDB Binlog
#### TiDB Data Migration

[TiDB Binlog](/reference/tidb-binlog/overview.md) is a tool used to collect binlog data from TiDB. It provides near real-time backup and replication to downstream platforms.
[TiDB Data Migration](https://pingcap.com/docs/tidb-data-migration/stable/) (DM) is an integrated data replication task management platform that supports the full data migration and the incremental data migration from MySQL/MariaDB into TiDB. It can help to reduce the operations cost and simplify the troubleshooting process.

The following are the basics of TiDB Binlog:
For the full data migration, it uses an embedded Loader and an embedded Mydumper. For the incremental data migration, it uses Syncer as its kernel.

- Input: TiDB clusters.
- Output: MySQL, TiDB, Kafka or incremental backup files.
- Compatibility: Compatible with TiDB v2.1 and later versions.
- Kubernetes: Supported. See [TiDB Binlog Cluster Operations](https://pingcap.com/docs/tidb-in-kubernetes/stable/deploy-tidb-binlog/) and [TiDB Binlog Drainer Configurations in Kubernetes](https://pingcap.com/docs/tidb-in-kubernetes/stable/configure-tidb-binlog-drainer/).
The following are the basics of DM:

## Tools development roadmap
- Input data source: MySQL/MariaDB master host/port
- Supported TiDB versions: all versions
- Kubernetes support: No, under development (the estimated time is 2020 Q2)

To help you understand the relationships between the above tools, here is a brief introduction to TiDB ecosystem tools development roadmap.
## Data export (backup)

### TiDB backup and restore
### Full data export tools

Mydumper and Loader -> BR:
#### Mydumper

Mydumper and Loader are inefficient since they back up and restore data on the logical level. BR is much more efficient because it takes advantage of TiDB features for backup and restore tasks. BR can be applied in huge data volume scenarios.
[Mydumper](/reference/tools/mydumper.md) is a tool to create a logical full backup for TiDB.

### TiDB full data restore
The following are the basics of Mydumper:

Loader -> TiDB Lightning:
- Input/Output
- Input: TiDB/MySQL host:port
- Output: schema and insert statements file
- Supported TiDB versions: all versions
- Kubernetes support: Yes. See [Backup and Restore](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-and-restore-using-helm-charts/) for details.

Loader is inefficient since it performs full data restoration using SQL. TiDB Lightning imports data into TiKV directly, so it is much more efficient and can be used for fast full import of large amounts (more than TBs) of data into a new TiDB cluster.
#### BR (beta)

TiDB Lightning also integrates the logical data import function of Loader and supports online data import. For details, see [TiDB Lightning TiDB-backend Document](/reference/tools/tidb-lightning/tidb-backend.md#migrating-from-loader-to-tidb-lightning-tidb-backend).
[BR](/reference/tools/br/br.md) (Backup & Restore) is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with Mydumper/loader, BR is more suitable for scenarios of huge data volume.

### MySQL data migration
The following are the basics of BR:

- Mydumper, Loader and Syncer -> DM:
- Input/Output
- Input: TiDB cluster
- Output: Full backup file
- Supported TiDB versions: v3.1 or v4.0
- Kubernetes support: Yes. The document is WIP.

It is tedious to migrate MySQL data to TiDB using Mydumper, Loader, and Syncer. DM provides an integrated data migration approach that improves usability. DM can be also used to merge the sharded schemas and tables.
### Incremental data export tools

- Loader -> TiDB Lightning:
#### TiDB Binlog

TiDB Lightning integrates the logical data import function of Loader. See [TiDB Lightning TiDB-backend document](/reference/tools/tidb-lightning/tidb-backend.md#migrating-from-loader-to-tidb-lightning-tidb-backend) for details. It is used to perform full data restoration.
[TiDB Binlog](/reference/tidb-binlog/overview.md) is a system that collects binlog for TiDB clusters and provides tools for near real-time sync and backup.

## Data migration solutions
The following are the basics of TiDB Binlog:

For TiDB 2.1, 3.0, and 3.1 versions, this section introduces data migration solutions in typical application scenarios.
- Input/Output:
- Input: TiDB Cluster
- Output: MySQL, TiDB, Kafka or incremental backup files
- Supported TiDB versions: v2.1 or later
- Kubernetes support: Yes. See [TiDB Binlog Cluster Operations](https://pingcap.com/docs/tidb-in-kubernetes/stable/deploy-tidb-binlog/) and [TiDB Binlog Drainer Configurations in Kubernetes](https://pingcap.com/docs/tidb-in-kubernetes/stable/configure-tidb-binlog-drainer/) for details.

### Full link data migration solutions for v3.0
#### CDC (Beta, under development, ETA May/June 2020 with TiDB 4.0)

#### Migrating MySQL data to TiDB
[CDC](https://pingcap.com/docs/dev/reference/tools/ticdc/overview/) (Change Data Capture) is a system that collects changelog for key value pairs in TiKV and outputs to downstream systems in row changed order.

If the volume is more than TBs of data, the recommended migration steps are:
- Input/Output:
- Input: TiDB Cluster
- Output: MySQL, TiDB, Kafka or incremental backup files
- Supported TiDB versions: v4.0
- Kubernetes support: On the development road map, ETA Q2 2020

1. Export full MySQL data using Mydumper;
2. Import full backup data from MySQL into a TiDB cluster using TiDB Lightning;
3. Replicate the incremental data of MySQL into TiDB.
## Recommended tools for TiDB 3.1

If the volume is less than TBs of data, it is recommended to migrate MySQL data to TiDB using DM (the migrating process includes full data import and incremental data replication).
- MySQL full data backup: use Mydumper
- MySQL full data import to TiDB:
- TB scale: use TiDB Lightning
- Sub-TB scale: use DM
- MySQL incremental data sync to TiDB: use DM
- TiDB full data backup: use BR
- TiDB full data restore: use BR
- TiDB incremental backup & restore: use TiDB-Binlog

#### Replication of TiDB cluster data
For the recommended tools for other TiDB versions, see [Recommended tools for TiDB versions](https://pingcap.com/docs/dev/reference/tools/user-guide/#recommended-tools-for-tidb-versions).

It is recommended that you use TiDB Binlog to replicate TiDB data to downstream TiDB/MySQL.
## Tools evolution roadmap

#### Full backup and restore of TiDB cluster data
- TiDB Full Data Backup:
- Mydumper -> BR
- Mydumper -> [dumpling](https://github.com/pingcap/dumpling) (under development, replace Lighting in lightweight scenarios)
- TiDB Full Data Restore:
- Loader -> Lightning -> BR
- MySQL Data Migration:
- Mydumper/Loader + Syncer -> DM (in the next step, we will integrate Lightning into DM)
- TiDB Incremental Data Migration:
- TiDB Binlog -> CDC

The recommended steps are:
## Full-path data migration solution for TiDB 3.1

1. Back up full data using Mydumper;
2. Restore full data into TiDB/MySQL using TiDB Lightning.
For TiDB 3.1 versions, this section covers how to migrate data from MySQL to TiDB, between TiDB clusters, and from TiDB to MySQL for each version, as well as how to back up and restore data.

### Full link data migration solutions for v3.1
### Migrating MySQL data to TiDB

#### Migrating MySQL data to TiDB
If the MySQL data volume is in TBs:

If the volume is more than TBs of data, the recommended migration steps are:
- Use Mydumper to export MySQL full data as a backup
- Use Lightning to import the full MySQL backup data into TiDB cluster
- Use DM to replicate incremental MySQL data to TiDB

1. Export full MySQL data using Mydumper;
2. Import full backup data from MySQL into a TiDB cluster using TiDB Lightning;
3. Replicate the incremental data of MySQL into TiDB.
If the MySQL data volume is in GBs:

If the volume is less than TBs of data, it is recommended to migrate MySQL data to TiDB using DM (the migrating process includes full data import and incremental data replication).
- Use DM to migrate MySQL data to TiDB for both full and incremental data import

#### Replication of TiDB cluster data
### Data replication between TiDB/MySQL clusters

It is recommended that you use TiDB Binlog to replicate TiDB data to downstream TiDB/MySQL.
You can use TiDB Binlog to replicate data between TiDB clusters. You can also use TiDB Binlog to replicate data to the downstream MySQL cluster.

#### Full backup and restore of TiDB cluster data
### Full backup and restore of the data in TiDB/MySQL clusters

- Restore to TiDB
To restore data to a TiDB cluster:

- Back up full data using BR;
- Restore full data using BR.
- Use the BR tool for both full data backup and full data restore

- Restore to MySQL
To restore data to a MySQL cluster:

- Back up full data using Mydumper;
- Restore full data using TiDB Lightning.
- Use the Mydumper tool for full data backup
- Use the Lightning tool with `tidb` backend for full data restore