Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and manage Tigris bucket for PG/Barman backups #3693

Merged
merged 56 commits into from
Jul 15, 2024

Conversation

ndarilek
Copy link
Collaborator

@ndarilek ndarilek commented Jul 1, 2024

Change Summary

Integrates the new Tigris WAL storage into flyctl.

  • If backups are enabled, provisions a new Tigris bucket on cluster creation and sets the barman secret appropriately.
  • Adds a --enable-backup flag to fly pg create command to enable bucket/secret provisioning on cluster creation.
  • Adds --restore-target-app, --restore-target-name, --restore-target-time and --restore-target-inclusive flags to fly pg create which behave as documented here. When a new instance with a cluster size of 1 is created with an app and optional target set, the specified backup is restored into the newly-created instance.
  • Adds a fly pg backup enable command for provisioning buckets and secrets into clusters where WAL backup is disabled.
  • Adds fly pg backup create for creating backups, optionally named.
  • Adds fly pg backup list to list existing backups.

Related to:


Documentation

  • Fresh Produce
  • In superfly/docs, or asked for help from docs team
  • n/a

@ndarilek ndarilek marked this pull request as ready for review July 2, 2024 21:40
@benwaffle
Copy link
Contributor

benwaffle commented Jul 4, 2024

If the bucket name is already taken, we should generate a unique name instead of erroring out

@ndarilek
Copy link
Collaborator Author

ndarilek commented Jul 8, 2024

If the bucket name is already taken, we should generate a unique name instead of erroring out

if provider.SelectName {
Looking at this naming logic, if a name isn't explicitly specified via OverrideName, it's automatically set to the target app name. I think the extension creation process in web will autogenerate random names but I don't see this easily exposed by flyctl's extension code. So in practice this means that PGs created with my-name will have buckets also named my-name.

I agree that the ideal case is not to bomb out on unique bucket names, but how likely is this in practice? I imagine:

  1. Someone runs fly pg create -n my-pg ...
  2. They then run fly app rm -y my-pg, followed by another create which would crash.

The crash isn't ideal but it isn't entirely unexpected and easy enough to work around by removing the PG storage bucket.

Either way, I don't want to spend much time overthinking this, so possible approaches if this needs to be addressed:

  1. Always autogenerate silly-name-1234 and similar. I know we do this in various places now, but I'm not immediately finding it in flyctl and wonder if it's happening in flaps/web?
  2. Append -postgres or similar to the extension names--that doesn't necessarily fix conflicts but at least minimizes clashes with someone creating appname as a regular file storage bucket.
  3. Append a suffix and retry reation if provisioning fails.

If this definitely needs addressing then I think I'd prefer 2, or 1 as long as we aren't pulling in another dependency. 2 is simple enough to just do and is probably a good idea anyway, so I'll just do that and we can decide what additional needs doing.

@ndarilek
Copy link
Collaborator Author

ndarilek commented Jul 8, 2024

how likely is this in practice?

Apparently somewhat, we hold onto recently-deleted bucket names so create/destroy/create cycles will fail even if the bucket itself is removed. Going to try scraping the error text for naming conflicts and appending a suffix, unless there's a better approach.

@davissp14
Copy link
Contributor

davissp14 commented Jul 9, 2024

Awesome start on this!

Couple notes:

Adds a --disable-backups flag to fly pg create command to disable bucket/secret provisioning on cluster creation.

I think we should actually make this opt-in to start. There's a cost implication that some users may not be cool with.

Adds fly pg backup list to list existing backups.

It looks like this currently breaks if the bucket name doesn't match the App name.

Adds a currently non-functional fly pg backup enable command for provisioning buckets and secrets into clusters where WAL backup is disabled.

  • Looks like this only sets the secret, so existing Apps will need to issue a deploy in order for the secret to be applied.
  • The log output after enabling is no longer accurate now that we've consolidated down to a single secret.
  • The S3_ARCHIVE_CONFIG secret is the only one we need to inject.
  • Once the release is cut, we will need to ensure that these commands can't be run against older Flex versions.

@ndarilek
Copy link
Collaborator Author

ndarilek commented Jul 9, 2024

• The log output after enabling is no longer accurate now that we've consolidated down to a single secret.
• The S3_ARCHIVE_CONFIG secret is the only one we need to inject.

Sorry, which log output do you mean specifically, and which additional secret am I injecting? Looks like I'm only injecting the one as per our docs.

Thanks for the feedback.

@ndarilek
Copy link
Collaborator Author

ndarilek commented Jul 9, 2024

• Once the release is cut, we will need to ensure that these commands can't be run against older Flex versions.

Agreed. Any thoughts about the easiest way to implement this? Version checks seem like the most immediately obvious, but that feels wrong and brittle. Seems like the better way is checking for the existence of this specific feature in the image, either with a command we can exec in and run, or something in the machine metadata. Thoughts?

Unless of course there's something I'm missing with the version check, but semver version comparison feels like something we'd either need another dependency for or would have to be very careful about.

@davissp14
Copy link
Contributor

Agreed. Any thoughts about the easiest way to implement this?

There's already logic in place for this:
https://github.com/superfly/flyctl/blob/master/internal/command/postgres/postgres.go#L48

ndarilek and others added 17 commits July 10, 2024 16:00
… Postgres clusters.

Usage:
  fly postgres [command]

Aliases:
  postgres, pg

Available Commands:
  attach      Attach a postgres cluster to an app
  barman      Manage databases in a cluster
  config      Show and manage Postgres configuration.
  connect     Connect to the Postgres console
  create      Create a new Postgres cluster
  db          Manage databases in a cluster
  detach      Detach a postgres cluster from an app
  events      Track major cluster events
  failover    Failover to a new primary
  import      Imports database from a specified Postgres URI
  list        List postgres clusters
  renew-certs Renews the SSH certificates for the Postgres cluster.
  restart     Restarts each member of the Postgres cluster one by one.
  users       Manage users in a postgres cluster

Flags:
  -h, --help   help for postgres

Global Flags:
  -t, --access-token string   Fly API Access Token
      --debug                 Print additional logs and traces
      --verbose               Verbose output

Use "fly postgres [command] --help" for more information about a command.
}

// Resolve the leader
leader, err := pickLeader(ctx, machines)
Copy link
Contributor

@benwaffle benwaffle Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if the DB is having problems, or a host is down. What I did in ui-ex is spin up a new temporary machine in the same app - https://github.com/superfly/ui-ex/blob/add83100682486e9d9e8df2d9faf16a5e4a2342a/lib/fly/postgres.ex#L241

Copy link
Contributor

@davissp14 davissp14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just a few comments.

internal/command/postgres/create.go Outdated Show resolved Hide resolved
internal/command/postgres/create.go Outdated Show resolved Hide resolved
Copy link
Contributor

@davissp14 davissp14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some command clarifications

internal/command/postgres/create.go Outdated Show resolved Hide resolved
internal/command/postgres/backup.go Outdated Show resolved Hide resolved
Copy link
Contributor

@davissp14 davissp14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ndarilek ndarilek merged commit 53c70ff into master Jul 15, 2024
19 of 34 checks passed
@ndarilek ndarilek deleted the pg-tigris-integration branch July 15, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants