Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for integrating kopia with kanister #1482

Merged
merged 32 commits into from
Jan 4, 2023
Merged
Changes from 19 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
a5a4c3c
Draft design for integrating kopia with kanister
shlokc9 Jun 15, 2022
b20ce81
Merge branch 'master' of github.com:kanisterio/kanister into design-d…
shlokc9 Jun 15, 2022
d2db921
Update the kopia.io link
shlokc9 Jun 15, 2022
3e4fd2a
Minor updates in the document
shlokc9 Jun 15, 2022
33ab9a0
Address review comments
shlokc9 Jun 20, 2022
0c55bd1
Merge branch 'master' of github.com:kanisterio/kanister into design-d…
shlokc9 Jun 20, 2022
23af8bf
Address reviewer comments
shlokc9 Jun 21, 2022
52b892c
minor updates to the introduction
shlokc9 Jul 26, 2022
7245b24
Update problem statement and goals
shlokc9 Jul 26, 2022
533ad7c
Update scope of the design
shlokc9 Jul 27, 2022
5cb8344
Add backward compatibility and user experience
shlokc9 Jul 27, 2022
b7d966e
Merge branch 'master' of github.com:kanisterio/kanister into design-d…
shlokc9 Jul 27, 2022
f89908e
Update design introduction
shlokc9 Jul 27, 2022
af98fb1
Rephrasing Problems section to motivation
shlokc9 Jul 28, 2022
e7356b6
Merge branch 'master' of github.com:kanisterio/kanister into design-d…
shlokc9 Aug 3, 2022
c48fbca
Rephrase the backward compatibility Kopia Repository Server
shlokc9 Aug 5, 2022
d89fb28
Rephrase the question in backward compatibility
shlokc9 Aug 5, 2022
59e1b64
Add high-level diagram for integrating kopia in kanister
shlokc9 Aug 10, 2022
2a4b88a
Remove images and apply suggestions
pavannd1 Aug 11, 2022
7b3b6e6
Update design/kanister-kopia-integration/kanister-kopia-integration.md
pavannd1 Aug 11, 2022
413ef77
Merge branch 'master' of github.com:kanisterio/kanister into design-d…
shlokc9 Aug 16, 2022
16ebbea
Add additional items for Kopia integration design
ihcsim Aug 17, 2022
3da4718
Add info on server replicas count and handling the credential changes
ihcsim Aug 22, 2022
58d25a0
Clarify on updating access users list
ihcsim Aug 24, 2022
96ae6a8
Add repo server CRD and consolidate files
pavannd1 Sep 27, 2022
71af0e3
Line wrap to 80 chars
pavannd1 Oct 7, 2022
ff1b58e
Address Ivan's reviews
pavannd1 Oct 11, 2022
06890d7
Merge branch 'master' of github.com:kanisterio/kanister into design-d…
shlokc9 Nov 25, 2022
97792a4
Merge branch 'design-doc-kanister-kopia-integration' of github.com:ka…
shlokc9 Nov 25, 2022
c172f12
Note about the Kanister Functions
shlokc9 Nov 25, 2022
4a50a63
Merge branch 'master' into design-doc-kanister-kopia-integration
mergify[bot] Dec 20, 2022
fb87bac
Merge branch 'master' into design-doc-kanister-kopia-integration
PrasadG193 Jan 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions design/kanister-kopia-integration/kanister-kopia-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Integrating Kopia with Kanister

This document proposes all the high-level changes required within Kanister to use [Kopia](https://kopia.io/) as the primary backup and restore tool.

## Motivation

Kanister offers an in-house capability to perform backup and restore to and from object stores using some operation-specific Functions like BackupData, RestoreData, etc.
Although they are useful and simple to use, these Functions can be significantly improved to provide better reliability, security, and performance.

The improvements would include:
1. Encryption of data during transfers and at rest
2. Efficient content-based data deduplication
3. Configurable data compression
4. Reduced memory consumption
5. Increased variety of backend storage target for backups

These improvements can be achieved by using `Kopia` as the primary data movement tool in these Kanister Functions.
pavannd1 marked this conversation as resolved.
Show resolved Hide resolved

Kanister also provides a command line utility `kando` that can be used to move data to and from object stores.
This tool internally executes `Kopia` commands to move the backup data.
The v2 version of the example Kanister Blueprints supports this. However, there are a few caveats to using these Blueprints.
1. `kando` uses `Kopia` only when a Kanister Profile of type `Kopia` is provided
2. A Kanister Profile of type `Kopia` requires a [Kopia Repository Server](https://kopia.io/docs/repository-server/) running in the same namespace as the Kanister controller
pavannd1 marked this conversation as resolved.
Show resolved Hide resolved
3. A Repository Server requires a [Kopia Repository](https://kopia.io/docs/repositories/) to be initialized on a backend storage target

Kanister currently lacks documentation and automation to use these features.

## Introducing Kopia

Kopia is a powerful, cross-platform tool for managing encrypted backups in the cloud.
It provides fast and secure backups, using compression, data deduplication, and client-side end-to-end encryption.
It supports a variety of backup storage targets, including object stores, which allows users to choose the storage provider that better addresses their needs. In Kopia, these storage locations are called repositories.
It is a lock-free system that allows concurrent multi-client operations including garbage collection.

To explore other features of Kopia, see its [documentation](https://kopia.io/docs/features/).

## Scope

1. Automate the initialization of a Kopia Repository for an application backed up by Kanister.
2. Design and automate the lifecycle of the required Kopia Repository Server.
3. Add new versions of Kanister Data Functions like BackupData, RestoreData, etc. with Kopia as the primary data mover tool.

## User Experience

- All the new features mentioned in this document will be opt-in only. Existing users will not see any changes in the Kanister controller's behavior.
- Users will be able to continue using their current Blueprints, switch to the v2 version of the example Blueprints, or use Blueprints with the new version of the Kanister Data Functions.
- Users opting to use the v2 Blueprints and Blueprints with Kopia-based Kanister Data Functions will be required to follow instructions to set up the required Kopia Repository Server before executing the actions.
pavannd1 marked this conversation as resolved.
Show resolved Hide resolved
- After setting up the Repository Server, users can follow the normal workflow to execute actions from the v2 Blueprints. To use the new versions of the Kanister Data Functions, users must specify the version of the function via the ActionSet Action's `preferredVersion` field.

## Detailed Design

### Kopia Repository

- As mentioned above, the backup storage location is called a "Repository" in Kopia.
- A separate repository will be used for each application protected by Kopia-based Blueprints in Kanister.
- The repository will be initialized when the Repository Server is created the first time. Once initialized, it will continue to exist at the location until cleaned up manually.
- Accessing the repository requires location and credential information similar to a Kanister Profile CR and a unique password used by Kopia during [encryption](https://kopia.io/docs/features/#end-to-end-zero-knowledge-encryption).
- In the first iteration, this password will be auto-generated by the Kanister controller. Future iterations will allow users to use a Key Management Service of choice.
pavannd1 marked this conversation as resolved.
Show resolved Hide resolved
- Only a single repository can exist at a particular backend storage location. To address this, the Kanister controller will generate the repository path using the location information and the UUID of the application namespace. For example, the repository path for `mysql` namespace and S3 bucket called `test-bucket` will be of the form `test-bucket/<UUID of mysql namespace>`.
viveksinghggits marked this conversation as resolved.
Show resolved Hide resolved
pavannd1 marked this conversation as resolved.
Show resolved Hide resolved

### Kopia Repository Server

- A Kopia Repository Server allows Kopia clients proxy access to the backend storage location through it.
- A separate server will be used for each repository initialized in Kanister.
viveksinghggits marked this conversation as resolved.
Show resolved Hide resolved
- In Kanister, the server will comprise a Kubernetes Pod, Service and a NetworkPolicy.
- The pod will execute the Kopia server process exposed to the application via the Kubernetes service and the network policy.
- Accessing the server requires the service address, a server username, and a password without any knowledge of the backend storage location.
- To authorize access, a list of server usernames and passwords must be added prior to starting the server.
- The server also uses TLS certificates to secure incoming connections to it.
- In the first iteration, the usernames and passwords will be auto-generated by the Kanister controller, and the TLS certificates will be generated by helm during installation of the controller.
- A future release of Kanister will allow users to specify usernames, passwords and use a cert-manager to manage the TLS certificates.

NOTE: A more detailed document describing the creation of the Kopia Repository Server in Kanister will be submitted shortly.

### Kanister Data Functions

- Kanister allows mutliple versions of Functions to be registered with the controller.
- Existing Functions are registered with the default `v0.0.0` version. Find more information [here](https://docs.kanister.io/functions.html#existing-functions).
- The following Data Functions will be registered with a second version `v1.0.0-alpha`:

1. BackupData
2. BackupDataAll
3. BackupDataStats
4. CopyVolumeData
5. DeleteData
6. DeleteDataAll
7. RestoreData
8. RestoreDataAll

- The purpose, signature and output of these functions will remain intact i.e. their usage in Blueprints will remain unchanged. However, their internal implementation will leverage Kopia to connect to the Repository Server to perform the required data operations.
- As noted above, users will execute these functions by specifying `v1.0.0-alpha` as the `preferredVersion` during the creation of an ActionSet.