Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update garbage collection docs #4552

Merged
merged 21 commits into from
Mar 17, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ Emergency-level alerts are often caused by a service or node failure. Manual int

* Solution:

1. Perform `select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME = "tikv_gc_leader_desc"` to locate the `tidb-server` corresponding to the GC leader;
1. Perform `SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME = "tikv_gc_leader_desc"` to locate the `tidb-server` corresponding to the GC leader;
2. View the log of the `tidb-server`, and grep gc_worker tidb.log;
3. If you find that the GC worker has been resolving locks (the last log is "start resolve locks") or deleting ranges (the last log is “start delete {number} ranges”) during this time, it means the GC process is running normally. Otherwise, contact [support@pingcap.com](mailto:support@pingcap.com) to resolve this issue.

Expand Down Expand Up @@ -633,7 +633,7 @@ For the critical-level alerts, a close watch on the abnormal metrics is required
* Solution:

1. It is normally because the GC concurrency is set too high. You can moderately lower the GC concurrency degree, and you need to first confirm that the failed GC is caused by the busy server.
2. You can moderately lower the concurrency degree by running `update set VARIABLE_VALUE="{number}” where VARIABLE_NAME=”tikv_gc_concurrency”`.
2. You can moderately lower the concurrency degree by adjusting [`tikv_db_concurrency`](/system-variables.md#tidb_gc_concurrency).

### Warning-level alerts

Expand Down
18 changes: 9 additions & 9 deletions backup-and-restore-using-dumpling-lightning.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,30 +61,30 @@ The steps to manually modify the GC time are as follows:
{{< copyable "sql" >}}

```sql
SELECT * FROM mysql.tidb WHERE VARIABLE_NAME = 'tikv_gc_life_time';
SHOW GLOBAL VARIABLES LIKE 'tidb_gc_life_time';
```

```sql
+-----------------------+------------------------------------------------------------------------------------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-----------------------+------------------------------------------------------------------------------------------------+
| tikv_gc_life_time | 10m0s |
+-----------------------+------------------------------------------------------------------------------------------------+
1 rows in set (0.02 sec)
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| tidb_gc_life_time | 10m0s |
+-------------------+-------+
1 row in set (0.03 sec)
```

{{< copyable "sql" >}}

```sql
UPDATE mysql.tidb SET VARIABLE_VALUE = '720h' WHERE VARIABLE_NAME = 'tikv_gc_life_time';
SET GLOBAL tidb_gc_life_time = '720h';
```

2. After executing the `dumpling` command, restore the GC value of the TiDB cluster to the initial value in step 1:

{{< copyable "sql" >}}

```sql
UPDATE mysql.tidb SET VARIABLE_VALUE = '10m' WHERE VARIABLE_NAME = 'tikv_gc_life_time';
SET GLOBAL tidb_gc_life_time = '10m';
```

## Restore data into TiDB
Expand Down
32 changes: 5 additions & 27 deletions br/backup-and-restore-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,11 @@ aliases: ['/docs/dev/br/backup-and-restore-tool/','/docs/dev/reference/tools/br/

## Usage restrictions

- BR only supports TiDB v3.1 and later versions.
- BR supports restore on clusters of different topologies. However, the online applications will be greatly impacted during the restore operation. It is recommended that you perform restore during the off-peak hours or use `rate-limit` to limit the rate.
- It is recommended that you execute multiple backup operations serially. Otherwise, different backup operations might interfere with each other.
- When BR restores data to the upstream cluster of TiCDC/Drainer, TiCDC/Drainer cannot replicate the restored data to the downstream.
- BR supports operations only between clusters with the same [`new_collations_enabled_on_first_bootstrap`](/character-set-and-collation.md#collation-support-framework) value because BR only backs up KV data. If the cluster to be backed up and the cluster to be restored use different collations, the data validation fails. Therefore, before restoring a cluster, make sure that the switch value from the query result of the `select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME='new_collation_enabled';` statement is consistent with that during the backup process.

- For v3.1 clusters, the new collation framework is not supported, so you can see it as disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@overvenus @3pointer PTAL the BR and tools part in this PR, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment as earlier) these are the master docs.

- For v4.0 clusters, check whether the new collation is enabled by executing `SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME='new_collation_enabled';`.

For example, assume that data is backed up from a v3.1 cluster and will be restored to a v4.0 cluster. The `new_collation_enabled` value of the v4.0 cluster is `true`, which means that the new collation is enabled in the cluster to be restored when this cluster is created. If you perform the restore in this situation, an error might occur.

## Recommended deployment configuration

- It is recommended that you deploy BR on the PD node.
Expand All @@ -30,7 +24,7 @@ aliases: ['/docs/dev/br/backup-and-restore-tool/','/docs/dev/reference/tools/br/
>
> If you do not mount a network disk or use other shared storage, the data backed up by BR will be generated on each TiKV node. Because BR only backs up leader replicas, you should estimate the space reserved for each node based on the leader size.
>
> Meanwhile, because TiDB v4.0 uses leader count for load balancing by default, leaders are greatly different in size, resulting in uneven distribution of backup data on each node.
> Because TiDB uses leader count for load balancing by default, leaders can greatly differ in size. This might resulting in uneven distribution of backup data on each node.

## Implementation principles

Expand Down Expand Up @@ -125,11 +119,11 @@ Currently, you can use SQL statements or the command-line tool to back up and re

### Use SQL statements

TiDB v4.0.2 and later versions support backup and restore operations using SQL statements. For details, see the [Backup syntax](/sql-statements/sql-statement-backup.md#backup) and the [Restore syntax](/sql-statements/sql-statement-restore.md#restore).
TiDB supports both [`BACKUP`](/sql-statements/sql-statement-backup.md#backup) and [`RESTORE`](/sql-statements/sql-statement-restore.md#restore) SQL statements. The progress of these operations can be monitored with the statement [`SHOW BACKUPS|RESTORES`](/sql-statements/sql-statement-show-backups.md).

### Use the command-line tool

Also, you can use the command-line tool to perform backup and restore. First, you need to download the binary file of the BR tool. For details, see [download link](/download-ecosystem-tools.md#br-backup-and-restore).
The `br` command-line utility is available as a separate download. For details, see [download link](/download-ecosystem-tools.md#br-backup-and-restore).

The following section takes the command-line tool as an example to introduce how to perform backup and restore operations.

Expand Down Expand Up @@ -195,19 +189,6 @@ Each of the above three sub-commands might still include the following three sub

To back up the cluster data, use the `br backup` command. You can add the `full` or `table` sub-command to specify the scope of your backup operation: the whole cluster or a single table.

If the BR version is earlier than v4.0.3, and the backup duration might exceed the [`tikv_gc_life_time`](/garbage-collection-configuration.md#tikv_gc_life_time) configuration which is `10m0s` by default (`10m0s` means 10 minutes), increase the value of this configuration item.

For example, set `tikv_gc_life_time` to `720h`:

{{< copyable "sql" >}}

```sql
mysql -h${TiDBIP} -P4000 -u${TIDB_USER} ${password_str} -Nse \
"update mysql.tidb set variable_value='720h' where variable_name='tikv_gc_life_time'";
```

Since v4.0.3, BR automatically adapts to GC and you do not need to manually adjust the `tikv_gc_life_time` value.

### Back up all the cluster data

To back up all the cluster data, execute the `br backup full` command. To get help on this command, execute `br backup full -h` or `br backup full --help`.
Expand Down Expand Up @@ -414,7 +395,7 @@ To restore the cluster data, use the `br restore` command. You can add the `full
>
> Even if each TiKV node eventually only need to read a part of the all SST files, they all need full access to the complete archive because:
>
> - Data are replicated into multiple peers. When ingesting SSTs, these files have to be present on *all* peers. This is unlike back up where reading from a single node is enough.
> - Data is replicated into multiple peers. When ingesting SSTs, these files have to be present on *all* peers. This is unlike back up where reading from a single node is enough.
> - Where each peer is scattered to during restore is random. We don't know in advance which node will read which file.
>
> These can be avoided using shared storage, for example mounting an NFS on the local path, or using S3. With network storage, every node can automatically read every SST file, so these caveats no longer apply.
Expand Down Expand Up @@ -665,10 +646,7 @@ Suppose that 4 TiKV nodes is used, each with the following configuration:

### Backup

Before the backup operation, check the following two items:

- You have set `tikv_gc_life_time` set to a larger value so that the backup operation will not be interrupted because of data loss.
- No DDL statement is being executed on the TiDB cluster.
Before the backup operation, check that no DDL statement is being executed on the TiDB cluster.

Then execute the following command to back up all the cluster data:

Expand Down
21 changes: 2 additions & 19 deletions br/backup-and-restore-use-cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,26 +74,9 @@ It is recommended that you use a network disk to back up and restore data. This

For the detailed usage of the `br backup` command, refer to [BR command-line description](/br/backup-and-restore-tool.md#command-line-description).

1. Before executing the `br backup` command, check the value of the [`tikv_gc_life_time`](/garbage-collection-configuration.md#tikv_gc_life_time) configuration item, and adjust the value appropriately in the MySQL client to make sure that [Garbage Collection](/garbage-collection-overview.md) (GC) does not run during the backup operation.
1. Before executing the `br backup` command, ensure that no DDL is running on the TiDB cluster.

{{< copyable "sql" >}}

```sql
SELECT * FROM mysql.tidb WHERE VARIABLE_NAME = 'tikv_gc_life_time';
UPDATE mysql.tidb SET VARIABLE_VALUE = '720h' WHERE VARIABLE_NAME = 'tikv_gc_life_time';
```

2. After the backup operation, set the parameter back to the original value.

{{< copyable "sql" >}}

```sql
UPDATE mysql.tidb SET VARIABLE_VALUE = '10m' WHERE VARIABLE_NAME = 'tikv_gc_life_time';
```

> **Note:**
>
> Since v4.0.8, BR supports the self-adaptive GC. To avoid manually adjusting GC, register `backupTS` in `safePoint` in PD and make sure that `safePoint` does not move forward during the backup process.
2. Ensure that the storage device where the backup will be created has sufficient space.

### Preparation for restoration

Expand Down
4 changes: 2 additions & 2 deletions dumpling-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,15 +307,15 @@ In other scenarios, if the data size is very large, to avoid export failure due
{{< copyable "sql" >}}

```sql
update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';
SET GLOBAL tidb_gc_life_time = '720h';
```

After your operation is completed, set the GC time back (the default value is `10m`):

{{< copyable "sql" >}}

```sql
update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';
SET GLOBAL tidb_gc_life_time = '10m';
```

Finally, all the exported data can be imported back to TiDB using [Lightning](/tidb-lightning/tidb-lightning-backends.md).
Expand Down
10 changes: 2 additions & 8 deletions error-codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ In addition, TiDB has the following unique error codes:

An unsupported database isolation level is set.

If you cannot modify the codes because you are using a third-party tool or framework, consider using `tidb_skip_isolation_level_check` to bypass this check.
If you cannot modify the codes because you are using a third-party tool or framework, consider using [`tidb_skip_isolation_level_check`](/system-variables.md#tidb_skip_isolation_level_check) to bypass this check.

{{< copyable "sql" >}}

Expand Down Expand Up @@ -179,16 +179,10 @@ In addition, TiDB has the following unique error codes:

* Error Number: 8055

The current snapshot is too old. The data may have been garbage collected. You can increase the value of `tikv_gc_life_time` to avoid this problem. The new version of TiDB automatically reserves data for long-running transactions. Usually this error does not occur.
The current snapshot is too old. The data may have been garbage collected. You can increase the value of [`tidb_gc_life_time`](/system-variables.md#tidb_gc_life_time) to avoid this problem. The new version of TiDB automatically reserves data for long-running transactions. Usually this error does not occur.

See [garbage collection overview](/garbage-collection-overview.md) and [garbage collection configuration](/garbage-collection-configuration.md).

{{< copyable "sql" >}}

```sql
update mysql.tidb set VARIABLE_VALUE="24h" where VARIABLE_NAME="tikv_gc_life_time";
```

* Error Number: 8059

The auto-random ID is exhausted and cannot be allocated. There is no way to recover from such errors currently. It is recommended to use bigint when using the auto random feature to obtain the maximum number of assignment. And try to avoid manually assigning values to the auto random column.
Expand Down
12 changes: 1 addition & 11 deletions faq/migration-tidb-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Restart the TiDB service, add the `-skip-grant-table=true` parameter in the conf

### How to export the data in TiDB?

Currently, TiDB does not support `select into outfile`. You can use the following methods to export the data in TiDB:
You can use the following methods to export the data in TiDB:

- See [MySQL uses mysqldump to export part of the table data](https://blog.csdn.net/xin_yu_xin/article/details/7574662) in Chinese and export data using mysqldump and the `WHERE` clause.
- Use the MySQL client to export the results of `select` to a file.
Expand Down Expand Up @@ -116,13 +116,3 @@ If the amount of data that needs to be deleted at a time is very large, this loo

- The [Lightning](/tidb-lightning/tidb-lightning-overview.md) tool is developed for distributed data import. It should be noted that the data import process does not perform a complete transaction process for performance reasons. Therefore, the ACID constraint of the data being imported during the import process cannot be guaranteed. The ACID constraint of the imported data can only be guaranteed after the entire import process ends. Therefore, the applicable scenarios mainly include importing new data (such as a new table or a new index) or the full backup and restoring (truncate the original table and then import data).
- Data loading in TiDB is related to the status of disks and the whole cluster. When loading data, pay attention to metrics like the disk usage rate of the host, TiClient Error, Backoff, Thread CPU and so on. You can analyze the bottlenecks using these metrics.

### What should I do if it is slow to reclaim storage space after deleting data?

You can configure concurrent GC to increase the speed of reclaiming storage space. The default concurrency is 1, and you can modify it to at most 50% of the number of TiKV instances using the following command:

{{< copyable "sql" >}}

```sql
update mysql.tidb set VARIABLE_VALUE="3" where VARIABLE_NAME="tikv_gc_concurrency";
```
8 changes: 2 additions & 6 deletions faq/sql-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,13 +134,9 @@ Deleting a large amount of data leaves a lot of useless keys, affecting the quer

## What should I do if it is slow to reclaim storage space after deleting data?

You can configure concurrent GC to increase the speed of reclaiming storage space. The default concurrency is 1, and you can modify it to at most 50% of the number of TiKV instances using the following command:
Because TiDB uses Multiversion concurrency control (MVCC), deleting data does not immediately reclaim space. Garbage collection is delayed so that concurrent transactions are able to see earlier versions of rows. This can be configured via the [`tidb_gc_life_time`](/system-variables.md#tidb_gc_life_time) (default: `10m0s`) system variable.

{{< copyable "sql" >}}

```sql
update mysql.tidb set VARIABLE_VALUE="3" where VARIABLE_NAME="tikv_gc_concurrency";
```
When performing a backup, the `tidb_gc_life_time` is also automatically extended so that the backup can complete successfully.

## Does `SHOW PROCESSLIST` display the system process ID?

Expand Down
4 changes: 2 additions & 2 deletions faq/tidb-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,12 +124,12 @@ The accessed Region is not available. A Raft Group is not available, with possib

#### ERROR 9006 (HY000): GC life time is shorter than transaction duration

The interval of `GC Life Time` is too short. The data that should have been read by long transactions might be deleted. You can add `GC Life Time` using the following command:
The interval of `GC Life Time` is too short. The data that should have been read by long transactions might be deleted. You can adjust [`tidb_gc_life_time`](/system-variables.md#tidb_gc_life_time) using the following command:

{{< copyable "sql" >}}

```sql
update mysql.tidb set variable_value='30m' where variable_name='tikv_gc_life_time';
SET GLOBAL tidb_gc_life_time = '30m';
```

> **Note:**
Expand Down
Loading