Skip to content

Commit

Permalink
enhance docs on clone and restore (#1592)
Browse files Browse the repository at this point in the history
* enhance docs on clone and restore

* add chapter about upgrading the operator

* add section for standby clusters

* Update docs/administrator.md

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
  • Loading branch information
FxKu and CyberDem0n authored Aug 27, 2021
1 parent 1dd0cd9 commit 7469efa
Show file tree
Hide file tree
Showing 2 changed files with 86 additions and 21 deletions.
96 changes: 80 additions & 16 deletions docs/administrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,21 @@
Learn how to configure and manage the Postgres Operator in your Kubernetes (K8s)
environment.

## Upgrading the operator

The Postgres Operator is upgraded by changing the docker image within the
deployment. Before doing so, it is recommended to check the release notes
for new configuration options or changed behavior you might want to reflect
in the ConfigMap or config CRD. E.g. a new feature might get introduced which
is enabled or disabled by default and you want to change it to the opposite
with the corresponding flag option.

When using helm, be aware that installing the new chart will not update the
`Postgresql` and `OperatorConfiguration` CRD. Make sure to update them before
with the provided manifests in the `crds` folder. Otherwise, you might face
errors about new Postgres manifest or configuration options being unknown
to the CRD schema validation.

## Minor and major version upgrade

Minor version upgrades for PostgreSQL are handled via updating the Spilo Docker
Expand Down Expand Up @@ -157,20 +172,26 @@ from numerous escape characters in the latter log entry, view it in CLI with
`PodTemplate` used by the operator is yet to be updated with the default values
used internally in K8s.

The operator also support lazy updates of the Spilo image. That means the pod
template of a PG cluster's stateful set is updated immediately with the new
image, but no rolling update follows. This feature saves you a switchover - and
hence downtime - when you know pods are re-started later anyway, for instance
due to the node rotation. To force a rolling update, disable this mode by
setting the `enable_lazy_spilo_upgrade` to `false` in the operator configuration
and restart the operator pod. With the standard eager rolling updates the
operator checks during Sync all pods run images specified in their respective
statefulsets. The operator triggers a rolling upgrade for PG clusters that
violate this condition.

Changes in $SPILO\_CONFIGURATION under path bootstrap.dcs are ignored when
StatefulSets are being compared, if there are changes under this path, they are
applied through rest api interface and following restart of patroni instance
The StatefulSet is replaced if the following properties change:
- annotations
- volumeClaimTemplates
- template volumes

The StatefulSet is replaced and a rolling updates is triggered if the following
properties differ between the old and new state:
- container name, ports, image, resources, env, envFrom, securityContext and volumeMounts
- template labels, annotations, service account, securityContext, affinity, priority class and termination grace period

Note that, changes in `SPILO_CONFIGURATION` env variable under `bootstrap.dcs`
path are ignored for the diff. They will be applied through Patroni's rest api
interface, following a restart of all instances.

The operator also support lazy updates of the Spilo image. In this case the
StatefulSet is only updated, but no rolling update follows. This feature saves
you a switchover - and hence downtime - when you know pods are re-started later
anyway, for instance due to the node rotation. To force a rolling update,
disable this mode by setting the `enable_lazy_spilo_upgrade` to `false` in the
operator configuration and restart the operator pod.

## Delete protection via annotations

Expand Down Expand Up @@ -667,6 +688,12 @@ if it ends up in your specified WAL backup path:
envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
```

You can also check if Spilo is able to find any backups:

```bash
envdir "/run/etc/wal-e.d/env" wal-g backup-list
```

Depending on the cloud storage provider different [environment variables](https://github.com/zalando/spilo/blob/master/ENVIRONMENT.rst)
have to be set for Spilo. Not all of them are generated automatically by the
operator by changing its configuration. In this case you have to use an
Expand Down Expand Up @@ -734,8 +761,15 @@ WALE_S3_ENDPOINT='https+path://s3.eu-central-1.amazonaws.com:443'
WALE_S3_PREFIX=$WAL_S3_BUCKET/spilo/{WAL_BUCKET_SCOPE_PREFIX}{SCOPE}{WAL_BUCKET_SCOPE_SUFFIX}/wal/{PGVERSION}
```

If the prefix is not specified Spilo will generate it from `WAL_S3_BUCKET`.
When the `AWS_REGION` is set `AWS_ENDPOINT` and `WALE_S3_ENDPOINT` are
The operator sets the prefix to an empty string so that spilo will generate it
from the configured `WAL_S3_BUCKET`.

:warning: When you overwrite the configuration by defining `WAL_S3_BUCKET` in
the [pod_environment_configmap](#custom-pod-environment-variables) you have
to set `WAL_BUCKET_SCOPE_PREFIX = ""`, too. Otherwise Spilo will not find
the physical backups on restore (next chapter).

When the `AWS_REGION` is set, `AWS_ENDPOINT` and `WALE_S3_ENDPOINT` are
generated automatically. `WALG_S3_PREFIX` is identical to `WALE_S3_PREFIX`.
`SCOPE` is the Postgres cluster name.

Expand Down Expand Up @@ -874,6 +908,36 @@ on one of the other running instances (preferably replicas if they do not lag
behind). You can test restoring backups by [cloning](user.md#how-to-clone-an-existing-postgresql-cluster)
clusters.

If you need to provide a [custom clone environment](#custom-pod-environment-variables)
copy existing variables about your setup (backup location, prefix, access
keys etc.) and prepend the `CLONE_` prefix to get them copied to the correct
directory within Spilo.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-pod-config
data:
AWS_REGION: "eu-west-1"
AWS_ACCESS_KEY_ID: "****"
AWS_SECRET_ACCESS_KEY: "****"
...
CLONE_AWS_REGION: "eu-west-1"
CLONE_AWS_ACCESS_KEY_ID: "****"
CLONE_AWS_SECRET_ACCESS_KEY: "****"
...
```

### Standby clusters

The setup for [standby clusters](user.md#setting-up-a-standby-cluster) is very
similar to cloning. At the moment, the operator only allows for streaming from
the S3 WAL archive of the master specified in the manifest. Like with cloning,
if you are using [additional environment variables](#custom-pod-environment-variables)
to access your backup location you have to copy those variables and prepend the
`STANDBY_` prefix for Spilo to find the backups and WAL files to stream.

## Logical backups

The operator can manage K8s cron jobs to run logical backups (SQL dumps) of
Expand Down
11 changes: 6 additions & 5 deletions docs/user.md
Original file line number Diff line number Diff line change
Expand Up @@ -733,20 +733,21 @@ spec:
uid: "efd12e58-5786-11e8-b5a7-06148230260c"
cluster: "acid-batman"
timestamp: "2017-12-19T12:40:33+01:00"
s3_wal_path: "s3://<bucketname>/spilo/<source_db_cluster>/<UID>/wal/<PGVERSION>"
```

Here `cluster` is a name of a source cluster that is going to be cloned. A new
cluster will be cloned from S3, using the latest backup before the `timestamp`.
Note, that a time zone is required for `timestamp` in the format of +00:00 which
is UTC. The `uid` field is also mandatory. The operator will use it to find a
correct key inside an S3 bucket. You can find this field in the metadata of the
source cluster:
is UTC. You can specify the `s3_wal_path` of the source cluster or let the
operator try to find it based on the configured `wal_[s3|gs]_bucket` and the
specified `uid`. You can find the UID of the source cluster in its metadata:

```yaml
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: acid-test-cluster
name: acid-batman
uid: efd12e58-5786-11e8-b5a7-06148230260c
```

Expand Down Expand Up @@ -799,7 +800,7 @@ no statefulset will be created.
```yaml
spec:
standby:
s3_wal_path: "s3 bucket path to the master"
s3_wal_path: "s3://<bucketname>/spilo/<source_db_cluster>/<UID>/wal/<PGVERSION>"
```

At the moment, the operator only allows to stream from the WAL archive of the
Expand Down

0 comments on commit 7469efa

Please sign in to comment.