Skip to content

Commit

Permalink
Merge branch 'zilliztech:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
gifi-siby authored Nov 11, 2024
2 parents ce84d76 + d67fa70 commit c9ae3a4
Show file tree
Hide file tree
Showing 7 changed files with 221 additions and 174 deletions.
206 changes: 38 additions & 168 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Milvus-backup
<div class="column" align="left">
<a href="https://discord.com/invite/8uyFbECzPX"><img height="20" src="https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white" alt="license"/></a>
<img src="https://img.shields.io/github/license/milvus-io/milvus" alt="license"/>
</div>


Milvus-Backup is a tool that allows users to backup and restore Milvus data. This tool can be utilized either through the command line or an API server.

Expand All @@ -8,145 +13,29 @@ The Milvus-backup process has negligible impact on the performance of Milvus. Mi

* Download binary from [release page](https://github.com/zilliztech/milvus-backup/releases). Usually the latest is recommended.


For Mac:
* Use [homebrew](https://brew.sh/) to install
* Use [homebrew](https://brew.sh/) to install on Mac
```shell
brew install zilliztech/tap/milvus-backup
```

## Config
In order to use Milvus-Backup, access to Milvus proxy and Minio cluster is required. Configuration settings related to this access can be edited in `backup.yaml`.
## Usage

Milvus-backup provides command line and API server for usage.

### Configuration
In order to use Milvus-Backup, access to Milvus proxy and Minio cluster is required. Configuration settings related to this access can be edited in [backup.yaml](configs/backup.yaml).

> [!NOTE]
>
> Please ensure that the configuration settings for Minio are accurate. There may be variations in the default value of Minio's configuration depending on how Milvus is deployed, either by docker-compose or k8s.
>
> Please be advised that it is not possible to backup data to a local path. Backup data is stored in Minio or another object storage solution used by your Milvus instance.
> |field|docker-compose |helm|
> |---|---|---|
> |bucketName|a-bucket|milvus-bucket|
> |rootPath|files|file|
## Development

### Build

```
go get
go build
```

Will generate an executable binary `milvus-backup` in the project directory.

### Test
### Command Line

Developers can also test it using an IDE. `core/backup_context_test.go` contains test demos for all main interfaces. Alternatively, you can test it using the command line interface:

```shell
cd core
go test -v -test.run TestCreateBackup
```

## API server

To start the RESTAPI server, use the following command after building:

```shell
./milvus-backup server
```

The server will listen on port 8080 by default. However, you can change it by using the `-p` parameter as shown below:

```shell
./milvus-backup server -p 443
```

### swagger UI

We offer access to our Swagger UI, which displays comprehensive information for our APIs. To view it, simply go to

```
http://localhost:8080/api/v1/docs/index.html
```

### API Reference

### `/create`

Creates a backup for the cluster. Data of selected collections will be copied to a backup directory. You can specify a group of collection names to backup, or if left empty (by default), it will backup all collections.

```
curl --location --request POST 'http://localhost:8080/api/v1/create' \
--header 'Content-Type: application/json' \
--data-raw '{
"async": true,
"backup_name": "test_backup",
"collection_names": [
"test_collection1","test_collection2"
]
}'
```

### `/list`

Lists all backups that exist in the `backup` directory in MinIO.

```
curl --location --request GET 'http://localhost:8080/api/v1/list' \
--header 'Content-Type: application/json'
```

### `/get_backup`

Retrieves a backup by name.

```
curl --location --request GET 'http://localhost:8080/api/v1/get_backup?backup_name=test_backup' \
--header 'Content-Type: application/json'
```

### `/delete`

Deletes a backup by name.

```
curl --location --request DELETE 'http://localhost:8080/api/v1/delete?backup_name=test_api' \
--header 'Content-Type: application/json'
```

### `/restore`

Restores a backup by name. It recreates the collections in the cluster and recovers the data through bulk insert. For more details about bulk insert, please refer to:
https://milvus.io/docs/bulk_insert.md

Bulk inserts will be done by partition. Currently, concurrent bulk inserts are not supported.

```
curl --location --request POST 'http://localhost:8080/api/v1/restore' \
--header 'Content-Type: application/json' \
--data-raw '{
"async": true,
"collection_names": [
"test_collection1"
],
"collection_suffix": "_bak",
"backup_name":"test_backup"
}'
```

### `/get_restore`

This is only available in the REST API. Retrieves restore task information by ID. We support async restore in the REST API, and you can use this method to get information on the restore execution status.

```
curl --location --request GET 'http://localhost:8080/api/v1/get_restore?id=test_restore_id' \
--header 'Content-Type: application/json'
```

## Command Line

Milvus-backup establish CLI based on cobra. Use the following command to see the usage.
Milvus-backup establish CLI based on cobra. Use the following command to see all the usage.

```
milvus-backup is a backup&restore tool for milvus.
Expand All @@ -172,75 +61,56 @@ Flags:
Use "milvus-backup [command] --help" for more information about a command.
```

## Demo

To try this demo, you should have a functional Milvus server installed and have pymilvus library installed.
Here is a [demo](docs/user_guide/e2e_demo_cli.md) for a complete backup and restore process.

Step 0: Check the connections
### API Server

First of all, we can use `check` command to check whether connections to milvus and storage is normal:
To start the RESTAPI server, use the following command after building:

```
./milvus-backup check
```shell
./milvus-backup server
```

normal output:
The server will listen on port 8080 by default. However, you can change it by using the `-p` parameter as shown below:

```shell
Succeed to connect to milvus and storage.
Milvus version: v2.3
Storage:
milvus-bucket: a-bucket
milvus-rootpath: files
backup-bucket: a-bucket
backup-rootpath: backup
./milvus-backup server -p 443
```

Step 1: Prepare the Data
We offer a [demo](docs/user_guide/api_demo.md) of the key APIs; however, please refer to the Swagger UI for the most up-to-date usage details, as the demo may occasionally become outdated.

Create a collection in Milvus called `hello_milvus` and insert some data using the following command:
#### swagger UI

```
python example/prepare_data.py
```

Step 2: Create a Backup

Use the following command to create a backup of the `hello_milvus` collection:
We offer access to our Swagger UI, which displays comprehensive information for our APIs. To view it, simply go to

```
./milvus-backup create -n my_backup
http://localhost:8080/api/v1/docs/index.html
```

Step 3: Restore the Backup
### Advanced feature

Restore the backup using the following command:
1. [Cross Storage Backup](docs/user_guide/cross_storage.md): Data is read from the source storage and written to a different storage through the Milvus-backup service. Such as, S3 -> local, S3 a -> S3 b.

```
./milvus-backup restore -n my_backup -s _recover
```
2. [RBAC Backup&Restore](docs/user_guide/rbac.md): Enable backup and restore RBAC meta with extra parameter.

This will create a new collection called `hello_milvus_recover` which contains the data from the original collection.

**Note:** if you want to restore index as well, add `--restore_index`, like this:
## Development

```
./milvus-backup restore --restore_index -n my_backup -s _recover
```
### Build

This will help you restore data and index at the same time. If you don't add this flag, you need to restore index manually.
For developers, Milvus-backup is easy to contribute to.

Step 4: Verify the Restored Data
Execute `make all` will generate an executable binary `milvus-backup` in the `{project_path}/bin` directory.

Create an index on the restored collection using the following command:
### Test

```
python example/verify_data.py
```
Developers can also test it using an IDE. `core/backup_context_test.go` contains test demos for all main interfaces. Alternatively, you can test it using the command line interface:

This will perform a search on the `hello_milvus_recover` collection and verify that the restored data is correct.
```shell
cd core
go test -v -test.run TestCreateBackup
```

That's it! You have successfully backed up and restored your Milvus collection.

## License
milvus-backup is licensed under the Apache License, Version 2.0.
23 changes: 19 additions & 4 deletions core/backup_impl_create_backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -698,10 +698,25 @@ func (b *BackupContext) executeCreateBackup(ctx context.Context, request *backup
for _, collection := range toBackupCollections {
collectionClone := collection
job := func(ctx context.Context) error {
err := retry.Do(ctx, func() error {
return b.backupCollectionPrepare(ctx, backupInfo, collectionClone, request.GetForce())
}, retry.Sleep(120*time.Second), retry.Attempts(128))
return err
retryForSpecificError := func(retries int, delay time.Duration) error {
for i := 0; i < retries; i++ {
err := b.backupCollectionPrepare(ctx, backupInfo, collectionClone, request.GetForce())
// If no error, return successfully
if err == nil {
return nil
}
// Retry only for the specific error
if strings.Contains(err.Error(), "rate limit exceeded") {
fmt.Printf("Attempt %d: Temporary error occurred, retrying...\n", i+1)
time.Sleep(delay)
continue
}
// Return immediately for any other error
return err
}
return fmt.Errorf("operation failed after %d retries", retries)
}
return retryForSpecificError(10, 10*time.Second)
}
jobId := b.getBackupCollectionWorkerPool().SubmitWithId(job)
jobIds = append(jobIds, jobId)
Expand Down
3 changes: 1 addition & 2 deletions core/storage/local_chunk_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,6 @@ func (lcm *LocalChunkManager) UploadObject(ctx context.Context, i UploadObjectIn
return err
}

fmt.Println("Successfully written to file!")
return nil
}

Expand Down Expand Up @@ -356,7 +355,7 @@ func CopyDir(source string, dest string) (err error) {
}

func CopyFile(source string, dest string) (err error) {

// get properties of source parent dir
sourceParentDir := filepath.Dir(source)
sourceParentDirInfo, err := os.Stat(sourceParentDir)
Expand Down
74 changes: 74 additions & 0 deletions docs/user_guide/api_demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# API Demos

### `/create`

Creates a backup for the cluster. Data of selected collections will be copied to a backup directory. You can specify a group of collection names to backup, or if left empty (by default), it will backup all collections.

```
curl --location --request POST 'http://localhost:8080/api/v1/create' \
--header 'Content-Type: application/json' \
--data-raw '{
"async": true,
"backup_name": "test_backup",
"collection_names": [
"test_collection1","test_collection2"
]
}'
```

### `/list`

Lists all backups that exist in the `backup` directory in MinIO.

```
curl --location --request GET 'http://localhost:8080/api/v1/list' \
--header 'Content-Type: application/json'
```

### `/get_backup`

Retrieves a backup by name.

```
curl --location --request GET 'http://localhost:8080/api/v1/get_backup?backup_name=test_backup' \
--header 'Content-Type: application/json'
```

### `/delete`

Deletes a backup by name.

```
curl --location --request DELETE 'http://localhost:8080/api/v1/delete?backup_name=test_api' \
--header 'Content-Type: application/json'
```

### `/restore`

Restores a backup by name. It recreates the collections in the cluster and recovers the data through bulk insert. For more details about bulk insert, please refer to:
https://milvus.io/docs/bulk_insert.md

Bulk inserts will be done by partition. Currently, concurrent bulk inserts are not supported.

```
curl --location --request POST 'http://localhost:8080/api/v1/restore' \
--header 'Content-Type: application/json' \
--data-raw '{
"async": true,
"collection_names": [
"test_collection1"
],
"collection_suffix": "_bak",
"backup_name":"test_backup"
}'
```

### `/get_restore`

This is only available in the REST API. Retrieves restore task information by ID. We support async restore in the REST API, and you can use this method to get information on the restore execution status.

```
curl --location --request GET 'http://localhost:8080/api/v1/get_restore?id=test_restore_id' \
--header 'Content-Type: application/json'
```

Loading

0 comments on commit c9ae3a4

Please sign in to comment.