Skip to content

Commit

Permalink
ETCD-610: automated backups no config
Browse files Browse the repository at this point in the history
  • Loading branch information
Elbehery committed Jun 17, 2024
1 parent b9df937 commit 035e9d5
Showing 1 changed file with 165 additions and 0 deletions.
165 changes: 165 additions & 0 deletions enhancements/etcd/automated-backups-no-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
title: automated-backups-no-config
authors:
- "@elbehery"
reviewers:
- "@soltysh"
- "@dusk125"
- "@hasbro17"
- "@tjungblu"
- "@williamcaban"
approvers:
- "@soltysh"
- "@dusk125"
- "@hasbro17"
- "@tjungblu"
- "@williamcaban"
api-approvers:
- "@soltysh"
creation-date: 2024-06-17
last-updated: 2024-06-17
tracking-link:
- https://issues.redhat.com/browse/ETCD-609
see-also:
- "https://issues.redhat.com/browse/OCPSTRAT-1411"
- "https://issues.redhat.com/browse/OCPSTRAT-529"
---


# Automated Backups of etcd with No Config

## Summary

Enable the automated backups of etcd from day 1 without provided config from the user.

## Motivation

The [current automated backup of etcd](https://docs.openshift.com/container-platform/4.15/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html#creating-automated-etcd-backups_backup-etcd) of an OpenShift
cluster relies on user provided configurations.

To improve the customers experience, this work proposes taking etcd backup using default config from day one, without additional configuration from the user.


### Goals
- Backups should be taken without configuration after cluster installation
- Backups are saved to a default PersistentVolume, that could be overridden by user.
- Backups are taken according to a default schedule, that could be overridden by user.
- Backups are taken according to a default retention policy, that could be overridden by user.


### Non-Goals
- Save cluster backups to cloud storage e.g S3.
- This could be a future enhancement or extension to the API.
- Automate cluster restoration.
- Provide automated backups for non-self hosted architectures like Hypershift.

### User Stories
- As a cluster administrator I want cluster backup to be taken without configuration.
- As a cluster administrator I want to schedule recurring cluster backups so that I have a recent cluster state to recover from in the event of quorum loss (i.e. losing a majority of control-plane nodes).
- As a cluster administrator I want to have failure to take cluster backups for more than a configurable period to be reported to me via critical alerts.


## Proposal
- Provide a default etcd backup configuration as a `EtcdBackup` CR.
- Provide a default reliable storage mechanism to save the backup data on.
- Provide a separate feature gate for the automated backup with default config.


### Workflow Description
- The user will enable the AutomatedBackupNoConfig feature gate.
- A `EtcdBackup` CR is being installed on the cluster.
- A PVC and PV are being created to save the backup data reliably.


### API Extensions
- No API changes are required.
- See https://github.com/openshift/enhancements/blob/ed885e10a0329a03a032b756ce9cd49a5f78af23/enhancements/etcd/automated-backups.md#L96

### Topology Considerations
TBD
#### Hypershift / Hosted Control Planes
TBD
#### Standalone Clusters
TBD
#### Single-node Deployments or MicroShift
TBD

### Implementation Details/Notes/Constraints [optional]
- Need to agree on a default schedule.
- Need to agree on a default retention policy.

- Several options exist for the default PVCName.
- Relying on `dynamic provisioning` is sufficient, however not an option for `SNO` or `BM` clusters.
- Utilising `local storage operator` is a proper solution, however installing a whole operator is too much overhead.
- The most viable solution to cover all OCP variants is to use `local volume`.
Please find below this solution's Pros & Cons.
- Pros :-
- The PV will be bound to a specific master node.
- The Backup Pod will be scheduled always to this master node since it uses a PVC bound to this PV.
- Using an SSD is possible, since local volume allows mounting it into a Pod.
- Having the Backup Pod scheduling deterministic is vital, since the retention policy needs the recent backup in place to work.
- Cons :-
- If the master node where the PV is mounted became unhealthy/unavailable/unreachable etc.
- The backups are no longer accessible, also taking new backups is no longer possible.
- Ideally, the backups should be taken on a round-robin fashion on each master node to avoid overwhelming a specific etcd member more than others.
- Also spreading the backups among all etcd cluster members provide guarantees for disaster recovery in case of losing two members at the same time.
- However, using the `local volume` option over all the master nodes will be complicated and error prune.

### Risks and Mitigations

When the backups are configured to be saved to a `local`` type PV, the backups are all saved to a singular master node where the PV is provisioned on the local disk.

In the event of a node becoming inaccessible or unschedulable, the recurring backups would not be scheduled. The periodic backup config would have to be recreated or updated with a different PVC that allows for a new PV to be provisioned on a node that is healthy.

### Drawbacks

## Design Details

### Open Questions


## Test Plan

An e2e test will be added to practice the scenario as follows

- Enable AutomatedBackupNoConfig FeatureGate.
- Verify that a `Etcdbackup` CR has been installed.
- Verify that a PV, PVC has been created.
- Verify that a backup has been taken.
- Verify that the backups are valid according to the retention policy.

## Graduation Criteria
TBD

### Dev Preview -> Tech Preview
TBD

### Tech Preview -> GA
TBD

### Removing a deprecated feature
TBD

## Upgrade / Downgrade Strategy
TBD

## Version Skew Strategy
TBD

## Operational Aspects of API Extensions
TBD

#### Failure Modes
TBD

## Support Procedures
TBD

## Implementation History
TBD

## Alternatives
TBD

## Infrastructure Needed
TBD

0 comments on commit 035e9d5

Please sign in to comment.