Skip to content

databendcloud/db-archiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

db-archiver

A simple tool to archive databases to Databend.

Installation

go install github.com/databend/db-archiver@latest

Usage

Config your database and Databend connection in config/conf.json:

{
  "sourceHost": "127.0.0.1",
  "sourcePort": 3306,
  "sourceUser": "root",
  "sourcePass": "123456",
  "sourceDB": "mydb",
  "sourceTable": "my_table",
  "sourceQuery": "select * from mydb.my_table",
  "sourceWhereCondition": "id < 100",
  "sourceSplitKey": "id",
  "databendDSN": "https://cloudapp:password@tn3ftqihs--medium-p8at.gw.aws-us-east-2.default.databend.com:443",
  "databendTable": "testSync.my_table",
  "batchSize": 2,
  "batchMaxInterval": 30,
  "workers": 1,
  "copyPurge": false,
  "copyForce": false,
  "disableVariantCheck": false,
  "userStage": "~",
  "deleteAfterSync": false
}

Run the tool and start your sync:

./db-archiver -f conf.json

The log output:

INFO[0000] Starting worker              
2024/06/25 11:35:37 ingest 2 rows (0.565646 rows/s), 64 bytes (18.100678 bytes/s)
2024/06/25 11:35:38 ingest 1 rows (0.556652 rows/s), 33 bytes (17.812853 bytes/s)
2024/06/25 11:35:38 ingest 2 rows (0.551906 rows/s), 65 bytes (17.660995 bytes/s)
2024/06/25 11:35:38 ingest 2 rows (0.531644 rows/s), 64 bytes (17.012600 bytes/s)
2024/06/25 11:35:38 ingest 2 rows (0.531768 rows/s), 64 bytes (17.016584 bytes/s)

Parameter References

Parameter Description Default example required
sourceHost source host true
sourcePort source port 3306 3306 true
sourceUser source user true
sourcePass source password true
sourceDB source database true
sourceTable source table true
sourceQuery source query true
sourceWhereCondition source where condition false
sourceSplitKey source split key no "id" false
sourceSplitTimeKey source split time key no "t1" false
timeSplitUnit time split unit "minute" "day" false
databendDSN databend dsn no "http://localhost:8000" true
databendTable databend table no "db1.tbl" true
batchSize batch size 1000 1000 false
copyPurge copy purge false false false
copyForce copy force false false false
DisableVariantCheck disable variant check false false false
userStage user external stage name ~ ~ false

NOTE: To reduce the server load, we set the sourceSplitKey which is the primary key of the source table. The tool will split the data by the sourceSplitKey and sync the data to Databend in parallel. The sourceSplitTimeKey is used to split the data by the time column. And the sourceSplitTimeKey and sourceSplitKey must be set at least one.

Two modes

Sync data according to the sourceSplitKey

If your source table has a primary key, you can set the sourceSplitKey to sync the data in parallel. The tool will split the data by the sourceSplitKey and sync the data to Databend in parallel. It is the most high performance mode. Th example of the conf.json:

{
  "sourceHost": "0.0.0.0",
  "sourcePort": 3306,
  "sourceUser": "root",
  "sourcePass": "123456",
    "sourceDB": "mydb",
  "sourceTable": "my_table",
  "sourceQuery": "select * from mydb.my_table",
  "sourceWhereCondition": "id < 100",
  "sourceSplitKey": "id",
  "databendDSN": "https://cloudapp:password@tn3ftqihs--medium-p8at.gw.aws-us-east-2.default.databend.com:443",
  "databendTable": "testSync.my_table",
  "batchSize": 2,
  "batchMaxInterval": 30,
  "workers": 1,
  "copyPurge": false,
  "copyForce": false,
  "disableVariantCheck": false,
  "userStage": "~",
  "deleteAfterSync": false
  "maxThreads": 10
}

Sync data according to the sourceSplitTimeKey

If your source table has a time column, you can set the sourceSplitTimeKey to sync the data in parallel. The tool will split the data by the sourceSplitTimeKey and sync the data to Databend in parallel. The sourceSplitTimeKey must be set with timeSplitUnit. The timeSplitUnit can be minute, hour, day. The timeSplitUnit is used to split the data by the time column. The example of the conf.json:

 "sourceHost": "127.0.0.1",
  "sourcePort": 3306,
  "sourceUser": "root",
  "sourcePass": "12345678",
  "sourceDB": "mydb",
  "sourceTable": "test_table1",
  "sourceQuery": "select * from mydb.test_table1",
  "sourceWhereCondition": "t1 >= '2024-06-01' and t1 < '2024-07-01'",
  "sourceSplitKey": "",
  "sourceSplitTimeKey": "t1",
  "timeSplitUnit": "hour",
  "databendDSN": "https://cloudapp:password@tn3ftqihs--medium-p8at.gw.aws-us-east-2.default.databend.com:443",
  "databendTable": "default.test_table1",
  "batchSize": 2,
  "batchMaxInterval": 30,
  "copyPurge": false,
  "copyForce": false,
  "disableVariantCheck": false,
  "userStage": "~",
  "deleteAfterSync": false,
  "maxThread": 10