Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

Decoupling Backup and Restore Logic from Etcd Operator #1626

Closed
hongchaodeng opened this issue Nov 6, 2017 · 5 comments
Closed

Decoupling Backup and Restore Logic from Etcd Operator #1626

hongchaodeng opened this issue Nov 6, 2017 · 5 comments

Comments

@hongchaodeng
Copy link
Member

hongchaodeng commented Nov 6, 2017

Decoupling Backup and Restore Logic from Etcd Operator

Currently the backup and restore are provided as features in the EtcdCluster API. These features have been useful and adopted by open source users. Over time, there are increasing requests to extend the functionalities (#1316, #1241). On the other hand, there is a limit for adding the features due to: 1. extending the API needs to happen in upstream codebase; 2. it is conflicted with making current API stable. Especially (2), because it blocks stabilizing the project.

To solve the above problems, we propose new ways to do backup and restore:

  • Backup: Currently EtcdCluster includes BackupPolicy to periodically save etcd snapshots. Instead, we propose to add a new EtcdBackup API which includes a reference to the EtcdCluster that it wants to make backup from. We built etcd-backup-operator as an example to save backups onto S3. As a result, etcd operator doesn’t need to know anything about backup.
  • Restore: Currently EtcdCluster includes RestorePolicy to restore an etcd member from a backup. Instead, we propose to add a new EtcdRestore API which includes the backup resource information and the EtcdCluster to be restored. We built etcd-restore-operator as an example to restore the data and create a new EtcdCluster. etcd operator can still manage the restored cluster, but it doesn’t need to know anything about the restore process.

By using the above methods, we can decouple backup and restore logic from etcd operator, and thus can remove BackupPolicy and RestorePolicy from EtcdCluster API.

User effect

Existing backup and restore features in EtcdCluster API won’t be supported after 0.7.0 release. Please check migration docs before upgrading.

The open source etcd-backup-operator and etcd-restore-operator are provided as examples for how to use EtcdCluster API. Based on what we know, most Internet company users have their own ways to do etcd backup and restore. We expect these users would write their own tooling and make their work easier by decoupling them from our codebase. For enterprise users, we recommend trying out Tectonic and make contacts for customized etcd backup/restore solutions.

Moving forward, we will focus on stabilizing etcd operator and make a stable release soon.

@hongchaodeng
Copy link
Member Author

@swapnilgm
Copy link

Sorry for replying on closed issue, but I think this is best place to have correct context.

The open source etcd-backup-operator and etcd-restore-operator are provided as examples for how to use EtcdCluster API. Based on what we know, most Internet company users have their own ways to do etcd backup and restore. We expect these users would write their own tooling and make their work easier by decoupling them from our codebase.

Does that mean this project will only stabilise etcd-operator and not backup and restore operator? Or is it under the roadmap that will eventually give fixed/stabilised backup operator structure/template and add some basic backup interface (something like save, delete, list API) and let user handle implementation of this interface for storage provider rather than whole backup operator. I think this is why issue #1241 is still open.

@xiang90
Copy link
Collaborator

xiang90 commented Nov 16, 2017

We will support/maintain an example backup/restore operator with limited functionality. Users want more advanced or customized feature can fork it, and develop on their own.

@raoofm
Copy link

raoofm commented Nov 16, 2017

@xiang90 a backup or restore operator with limited functionality is fine and I totally agree with getting away from maintaining tons of usecases that might pop up. But there is a concern where this basic functionality comes with a warning to not use in production. It would be great if at least 1 production grade usecase is implemented/satisfied for others to follow even if it is just s3 or pv.

@xiang90
Copy link
Collaborator

xiang90 commented Nov 16, 2017

@raoofm

Yes. We will make the basic functionality stable overtime, and suggest that it is production ready.
/cc @fanminshi @hongchaodeng

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants