Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring from S3 on a different machine #1066

Closed
asoltesz opened this issue Jun 10, 2020 · 5 comments
Closed

Restoring from S3 on a different machine #1066

asoltesz opened this issue Jun 10, 2020 · 5 comments
Assignees
Labels

Comments

@asoltesz
Copy link

Please provide the following information when submitting an issue (feature requests or general comments can skip this):

  1. pgBackRest version:

version 2.25

  1. PostgreSQL version:

12.3

  1. Operating system/version - if you have more than one server (for example, a database server, a repository host server, one or more standbys), please specify each:

CentOS 7

  1. Did you install pgBackRest from source or from a package?

package (CrunchyData Postgres Operator)

  1. Please attach the following as applicable:

Stanza-create fails with this:

time="2020-06-10T16:47:04Z" level=info msg="pgo-backrest starts"
time="2020-06-10T16:47:04Z" level=info msg="debug flag set to false"
time="2020-06-10T16:47:04Z" level=info msg="backrest stanza-create command requested"
time="2020-06-10T16:47:04Z" level=info msg="backrest command will be executed for both local and s3 storage"
time="2020-06-10T16:47:04Z" level=info msg="command to execute is [pgbackrest stanza-create  --db-host=172.18.0.13 --db-path=/pgdata/hippo && pgbackrest stanza-create  --db-host=172.18.0.13 --db-path=/pgdata/hippo --repo-type=s3]"
time="2020-06-10T16:47:04Z" level=info msg="command is pgbackrest stanza-create  --db-host=172.18.0.13 --db-path=/pgdata/hippo && pgbackrest stanza-create  --db-host=172.18.0.13 --db-path=/pgdata/hippo --repo-type=s3 "
time="2020-06-10T16:47:06Z" level=error msg="command terminated with exit code 28"
time="2020-06-10T16:47:06Z" level=info msg="output=[]"
time="2020-06-10T16:47:06Z" level=info msg="stderr=[ERROR: [028]: backup and archive info files exist but do not match the database\n       HINT: is this the correct stanza?\n       HINT: did an error occur during stanza-upgrade?\n]"
time="2020-06-10T16:47:06Z" level=error msg="command terminated with exit code 28"
  1. Describe the issue:

I am having trouble understanding how one can restore in the following situation:

  • Backups were saved to a remote S3 repo
  • Postgres host and local backrest repo completely destroyed (whole machine crashed)
  • I have a new machine with the same Postgres/pgbackrest version installed
  • I would like to restore from S3 but the system-id on the new machine will always be different from the the one that was used to create the backups to S3, so create-stanza fails on the existing S3 bucket

I didn't find information in the Guide for this situation.

How do I create the stanza and restore to the new machine?

@dwsteele dwsteele self-assigned this Jun 10, 2020
@dwsteele
Copy link
Member

This is covered here: https://pgbackrest.org/user-guide-centos7.html#replication/hot-standby. You just need to modify the recovery settings to whatever you need to recover your primary. PITR instructions are here: https://pgbackrest.org/user-guide-centos7.html#pitr

[ERROR: [028]: backup and archive info files exist but do not match the database

There's no need to create the stanza again -- it's already created. All you need is an empty PGDATA dir or specify --delta when you restore.

@asoltesz
Copy link
Author

Thanks, I managed to do the restore.

@Venryx
Copy link

Venryx commented Aug 29, 2021

For others finding this, just wanted to mention that if you're getting the backup and archive info files exist but do not match the database error when using:

spec:
  backups:
    pgbackrest:
      restore:
        [...]

Then try using the alternate restore approach:

spec:
  dataSource:
    [...]

API reference: https://access.crunchydata.com/documentation/postgres-operator/v5/tutorial/disaster-recovery/

It appears the second type can work even if the database system-id differs between the backup and the target cluster, whereas the first cannot. (However, don't be like me and assume the backup is failing if it sits there for a while; in my case, I have to wait 2.5 minutes before any of the backup's files start actually being restored. So be patient before changing further settings or the like.)

For reference, here is the code that causes the error:

if (pgVersion != archiveInfoPg.version || pgSystemId != archiveInfoPg.systemId)
{
THROW(FileInvalidError, "backup and archive info files exist but do not match the database\n"
"HINT: is this the correct stanza?\n"
"HINT: did an error occur during stanza-upgrade?");
}

@Venryx
Copy link

Venryx commented Aug 29, 2021

After more experimenting, I found that the the error can occur for the dataSource approach as well; however, it only has it for a specific scenario:

  1. A backup-repo was created for the old system-identifier.
  2. However, the backup-repo never had a base-backup pushed to it.

When a new postgres cluster is launched (with a new system-identifier), I'd tend to expect that the cluster would look into the repo, and either:
A) Load in the configuration from it (so the system-id matches for subsequent backups/restores), or...
B) Realize that there are no actual base-backups in the backup-repo, and thus ignore it (or just log a warning).

Instead, the postgres-operator notices the backup-repo, and complains about it, but doesn't offer an easy way to solve it:

  1. You can't just ignore/overwrite the mismatched backup-repo, because PGO doesn't offer a way to do so. (it errors from the config/system-id mismatch, before you're able to have it do any backup reading/writing)
  2. You can't tell PGO to load in the configuration (which includes the system-id), because there is no base-backup that you can point the dataSource entry to.

A third option, which does work, is to delete the backup-repo folder in the cloud manually. Then PGO sees there is no mismatch, creates a new cluster, and populates the backup-repo with its own configuration.

This works, but is not terribly obvious to new users; perhaps a special error message could be displayed for the backup-repo exists, but without actual backups case, to clarify to new users what should be done.

EDIT: I put some further (arguably more helpful) notes on stanza-related issues here: https://github.com/debate-map/app/blob/56180dca95148d3af65aa14626093d62dca432fc/README.md?plain=1#L618

Venryx added a commit to debate-map/app that referenced this issue Aug 29, 2021
…_ovh, read/write to different buckets (for the db-backups).

* Finally figured out why the system-id mismatch-error is necessary, and how to avoid/deal-with it. (basically, if you're going to be using a backup-repo contents, you need to initialize your database instance from one of its backups; this is necessary because of the way postgres physical backups work; see here for some more info: pgbackrest/pgbackrest#1066 (comment))

Because of the limitations of physical backups, I plan to set up weekly (or so) logical backups as well. That's for another time though, as physical backups should be fine for now. (ie. while I'm on the same postgres version)
@dwsteele
Copy link
Member

A) Load in the configuration from it (so the system-id matches for subsequent backups/restores)

The system identifier cannot be updated in Postgres.

Realize that there are no actual base-backups in the backup-repo, and thus ignore it (or just log a warning).

The thing to do here is issue a stanza-upgrade or maybe better, a stanza-delete/stanza-create since the repo is pretty useless without backups.

This works, but is not terribly obvious to new users; perhaps a special error message could be displayed for the backup-repo exists, but without actual backups case, to clarify to new users what should be done.

This seems like something you should suggest at https://github.com/CrunchyData/postgres-operator. Actually, that pretty much goes for all of this.

@github-actions github-actions bot locked and limited conversation to collaborators May 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants