- Author(s): kennytm
- Last updated: 2020-04-20
- Discussion at: pingcap#15274
Support running backup, restore, import and export (BRIE) tasks directly on TiDB via SQL.
BR is a TiDB ecosystem tool providing fast, KV-level logical backup and restore in SST format. Lightning is another tool for importing SQL and CSV data dump into TiDB, using SST as intermediate storage.
We would like to integrate these tools directly into TiDB as SQL statements because
-
BR and Lightning requires direct connection with TiKV and PD. Since direct operation on TiKV bypass TiDB's authentication, these ports must be closed from public. In highly protected systems like DBaaS, this means BR and Lightning as separate tools cannot be used at all.
-
Supporting SQL access also improves usability, as you don't need to install new components to use these functions, and SQL should be more familiar with database operators.
Introduce 4 kinds of statements for executing backup, restore, import and export:
BACKUP DATABASE * TO 's3://storage/url';
RESTORE DATABASE * FROM 's3://storage/url';
IMPORT DATABASE * FROM 'file:///data/dump/';
EXPORT DATABASE * TO 's3://storage/url';
(Note: there is no plan to implement EXPORT
as of TiDB 4.0.)
SCHEMA
can be used as alias of DATABASE
, as usual.
These can be restricted to selected databases or tables only:
BACKUP DATABASE db1, db2, db3 TO 's3://storage/url';
BACKUP TABLE db1.tbl1, db2.tbl2, db3.tbl3 TO 's3://storage/url';
Use query parameters to describe about the data source. Other options can be supplied at the end of the statement.
RESTORE DATABASE *
FROM 's3://storage/url?access-key=AAAAA&secret-access-key=SSSSS&force-path-style=1&endpoint=http://10.0.1.55:8888'
RATE_LIMIT = 512 MB/SECOND;
The =
sign in the options can be omitted.
Data source parameters
- s3://
- endpoint=http://s3.example.com:8080
- region=us-east-1
- storage-class=STANDARD
- sse=AES256
- acl=READ
- access-key=AKIAIOSFODNN7EXAMPLE
- secret-access-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- provider=aws
- force-path-style=1
- use-accelerate-endpoint=0
- gcs://
- endpoint=http://gcs.example.com:8080
- storage-class=STANDARD
- predefined-acl=READER
- credentials-file=/home/tidb/gcp_creds.json
Common options
- SEND_CREDENTIALS_TO_TIKV = 1
- CONCURRENCY = 32
- CHECKSUM = 1
- RATE_LIMIT = 125 MB/SECOND
- CHECKPOINT = 1
Backup/Export options
- SNAPSHOT = 4 HOUR AGO
- SNAPSHOT = '2020-04-20 15:00:00'
- SNAPSHOT = 416110777682551
- LAST_BACKUP = '2020-04-20 15:00:00'
- LAST_BACKUP = 416110777682551
Restore/Import options
- ONLINE = 1
Import/Export CSV options
- CSV_SEPARATOR = ','
- CSV_DELIMITER = '"'
- CSV_HEADER = COLUMNS
- CSV_HEADER = 0
- CSV_NOT_NULL = 1
- CSV_NULL = '\N'
- CSV_BACKSLASH_ESCAPE = 1
- CSV_TRIM_LAST_SEPARATORS = 1
Other Import options
- ANALYZE = 1
- SKIP_SCHEMA_FILES = 1
- BACKEND = 'local'
- TIKV_IMPORTER = '127.0.0.1:8287'
- ON_DUPLICATE = 'replace'
- STRICT_FORMAT = 1
Some niche configuration (e.g. Lightning's io-concurrency
) are not exposed currently.
All 4 kinds of BRIE statements are synchronous — they block the current session until completion.
Nevertheless, these queries can be canceled using the standard KILL TIDB QUERY n
statement.
Additionally, because the BRIE tasks are not using transactions, they cannot be executed concurrently. When one BRIE task is already running, another BRIE statement must not start until the existing task has completed.
Because the BRIE statements interact with TiKV directly and can read from / write to arbitrary path, all 4 statements require SUPER privilege to execute.
If the statement is canceled or an error is encountered, it will report an error e.g.
MySQL [tpcc]> backup table tpcc.stock to 'local:///tmp/br_tpcc_30';
ERROR 8124 (HY000): Backup failed: backup meta exists, may be some backup files in the path already
If the statement completed successfully, it will report one row of summary e.g.
MySQL [tpcc]> backup database tpcc to 'local:///tmp/br_tpcc_32';
+-------------------------+-----------+-----------------+---------------------+---------------------+
| Destination | Size | BackupTS | Queue Time | Execution Time |
+-------------------------+-----------+-----------------+---------------------+---------------------+
| local:///tmp/br_tpcc_32 | 248665063 | 416099531454472 | 2020-04-12 23:09:48 | 2020-04-12 23:09:48 |
+-------------------------+-----------+-----------------+---------------------+---------------------+
1 row in set (58.453 sec)
Note:
The
RESTORE
(BR) andIMPORT
(Lightning) operations are not using transactions and are not ACID compliant.The data and index KV pairs are ingested separately, i.e. a table being restored would have in an intermediate state where some rows are missing index, and some index doesn't have corresponding rows. If the operation has failed or canceled, the table would be stuck in this inconsistent state.
There is currently no plan to change this behavior.
The status of the queued and running tasks can be queried using the SHOW
statement:
SHOW BACKUPS;
SHOW RESTORES;
SHOW IMPORTS;
SHOW EXPORTS;
Example:
MySQL [(none)]> show backups;
+-------------------------+---------+-------------------+---------------------+---------------------+------------+
| Destination | State | Progress | Queue_Time | Execution_Time | Connection |
+-------------------------+---------+-------------------+---------------------+---------------------+------------+
| local:///tmp/br_tpcc_30 | Backup | 98.38709677419355 | 2020-04-12 23:09:03 | 2020-04-12 23:09:25 | 4 |
| s3://bucket/prefix | Wait | 0 | 2020-04-12 23:09:48 | 0000-00-00 00:00:00 | 9 |
+-------------------------+---------+-------------------+---------------------+---------------------+------------+
The content can be filtered using LIKE
and WHERE
clauses, e.g.
SHOW BACKUP LIKE 'local:%'; -- same as `Destination` LIKE 'local:%'
SHOW BACKUP WHERE `State` = 'Wait';
Similar to the BRIE statements, these SHOW
statements also require SUPER privilege.
To protect against leakage of secret keys, all query parameters of S3 and GCS destinations are discarded in the SHOW statements.
We have chosen synchronous execution entirely due to usability.
Internal tests find that it is very confusing that running BACKUP
returns immediately while nothing is written out,
and not knowing when the BACKUP
has finished.
This is also similar to the treatment of other long-running tasks in MySQL like CREATE INDEX
and ANALYZE
.
In most existing RDBMS which supports BRIE via SQL, they are executed synchronously:
Dialect | Sync? | Syntax |
---|---|---|
Transact-SQL | Sync | RESTORE DATABASE db1 FROM URL = 'https://example.com/backup1.bak' |
IBM DB2 | Sync | RESTORE DATABASE db1 FROM 'db2remote://example//backup/1' |
SAP HANA | Sync | RECOVER DATA USING FILE ('/data/backup1') CLEAR LOG |
MemSQL (ETL) | Async | CREATE PIPELINE restore_pl AS LOAD DATA S3 'bucket/backup1' CREDENTIALS '{…}' INTO TABLE db1.tbl1; START PIPELINE restore_pl |
CockroachDB | Sync | RESTORE DATABASE db1 FROM 's3://bucket/backup1' |
On the other hand, synchronous execution is prone to accidental cancellation due to losing connection.
Anyway, if we do want to make it asynchronous, the syntax should be changed to emphasize this aspect (similar to MemSQL's syntax).
The current syntax inherits from an early design, but they may be tweaked, e.g.
-
We use
DATABASE *
to indicate no filtering, but perhaps we should omit it?BACKUP TO 's3://storage/url'; RESTORE FROM 's3://storage/url'; IMPORT FROM 'file:///data/dump/'; EXPORT TO 's3://storage/url';
Currently we encode all storage-related parameters into the data source URL. There are two other alternatives:
-
Use options:
RESTORE DATABASE * FROM 's3://storage/url' S3_ACCESS_KEY = 'AAAAA' S3_SECRET_ACCESS_KEY = 'SSSSS' S3_FORCE_PATH_STYLE = 1 S3_ENDPOINT = 'http://10.0.1.55:8888' RATE_LIMIT = 512 MB/SECOND;
-
Use a JSON configuration:
RESTORE DATABASE * FROM 's3://storage/url' CONFIG = '{"s3":{"access-key":"AAAAA", "secret-access-key":"SSSSS", "force-path-style":true, "endpoint":"http://10.0.1.55:8888"}}' RATE_LIMIT = 512 MB/SECOND;
The current BR tool's command line is close to the first option, changed from using query parameters. This was mainly introduced to avoid accidentally leaking the secret keys. In this integration we need to strip off the query parameter as soon as possible.
For comparison, to configure a cloud destination in other RDBMS,
Dialect | Configuration |
---|---|
Transact-SQL | Automatically configured inside Azure, not supported otherwise |
IBM DB2 | User needs to configure the db2remote:// first, before executing the BRIE statements |
SAP HANA | Cloud storage seems not supported |
MemSQL | Two JSON strings, one for credentials and one for non-secret settings (e.g. region, endpoint) |
CockroachDB | URL query parameters |
The four BRIE statements are pure additions in TiDB 4.0+, and thus compatible with older versions.
During data migration, the BACKUP
and EXPORT
statements from upstream should be ignored.
RESTORE
and IMPORT
statements cannot be properly replicated.
Both of these will execute some DDL statements via SQL (which can be replicated).
However, the data ingestion skips the SQL layer, so binlog replication fails.
Even at TiKV level, because downstream doesn't have the same SST files, CDC may not work properly.
Currently RESTORE
and IMPORT
only support offline execution on a fresh cluster, i.e.
the cluster is not supposed to be serving external actors, and replication should not be set up,
so this is not expected to be a problem.
However, we should take this in mind when we support true online restore.
The BRIE statements are implemented by embedding BR and Lightning as modules into TiDB.
These modules will expose an entry point (e.g. the RunBackup
function in BR) which
- accepts a configuration structure defining entirely how a task should run, and
- executes the task in the current goroutine
The entry point function will be shared by both the CLI tool and TiDB executor. TiDB should provide every information it knows into the configuration, including:
- PD addresses
- TLS certificate and key paths
The tools should upstream logs and metrics (which should happen automatically).
On the Backup/Restore part, PRs are already submitted to implement this:
- Parser — pingcap/parser#746
- TiDB — pingcap#15274
The Import (Lightning) part will start after the Backup/Restore design is accepted.
There is currently no plan to implement the Export part.
Since BRIE relies on proper TiKV, the built-in testing framework using mocktikv will not work. Testing is going to be relying entirely on integration tests.