Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre/Post Backup & Pre/Post Restore Plugins #1150

Closed
king-jam opened this issue Jan 9, 2019 · 15 comments
Closed

Pre/Post Backup & Pre/Post Restore Plugins #1150

king-jam opened this issue Jan 9, 2019 · 15 comments
Labels
Enhancement/User End-User Enhancement to Velero Icebox We see the value, but it is not slated for the next couple releases. Needs Product Blocked needing input or feedback from Product Reviewed Q2 2021 staled

Comments

@king-jam
Copy link

king-jam commented Jan 9, 2019

Describe the solution you'd like
I'd like to be able to use the same plugin framework but guarantee ordering based on the registration call within the plugin framework. This is for scenarios where "rehydration" of data is required via an application specific hook.

The current example that I'm working with is backup/restore of a MySQL environment. PV Snapshots only provide a crash consistent approach whereas leveraging mysqldump provides a point-in-time application consistent backup. The existing plugin framework does not encounter issues during backup but during restore is unable to guarantee that the plugin will execute after the service endpoint is up.

The workaround is to use the restore plugin action as a timing sequence to start a separate job that can remotely rehydrate the database but that results in the ark restore reporting success before the database is actually restored. This could create an issue with how statuses are conveyed to a user or an automation system that wraps the ark CLI.

Anything else you would like to add:
N/A

Environment:

  • Ark version (use ark version): v0.10.0
  • Kubernetes version (use kubectl version): v1.13.0
  • Kubernetes installer & version: Minikube, PKS, KubeSpray
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.1 LTS
@ncdc
Copy link
Contributor

ncdc commented Jan 9, 2019

Our backup and restore item action Execute methods are executed pre-backup (before persisting in the tarball) and pre-restore (before creating the item in Kubernetes). I'm maybe thinking that post-backup doesn't make much sense (do you have any use cases)? But post-restore definitely would give you what you need here. WDYT?

@ncdc ncdc added Enhancement/User End-User Enhancement to Velero Needs Product Blocked needing input or feedback from Product labels Jan 9, 2019
@king-jam
Copy link
Author

king-jam commented Jan 9, 2019

Definitely agree. Post Restore would be the priority scenario. The rest may be generalized under the feature add but aren't required immediately.

@king-jam
Copy link
Author

After thinking about it further, The Post Restore hooks would need a caveat.....mainly in a scenario of restoring a database/etc, the hooks needs to happen just before exposing the Service so higher level apps don't start trying to execute transactions before data is fully restored.

An init container can solve this. It just leaves breadcrumbs that we might not want every time.

@ncdc
Copy link
Contributor

ncdc commented Jan 10, 2019

Let's walk through the scenarios in full detail.

Backup

  • A custom backup item action plugin that you're writing calls mysqldump and uploads the output to the Ark ObjectStore

Restore

  1. Ark restores pod
  2. A custom restore item action plugin runs post-restore for the pod
    1. It waits for the pod to be running
    2. It restores the data backed up by mysqldump

Post-restore, pod restart and/or scale-up

  1. Pod starts up, data has already been restored

I'm not sure an initContainer would work, because you'd need the main mysqld process up and running to be able to restore the backed up sql, and that's in a regular container, not an initContainer.

Instead, I think you could use an exec-based liveness and/or readiness probe on the mysql pod. It could wait for the presence of marker file (placed by the restore item action plugin) before saying live/ready. You'd have to be careful to get it in a path that survives restarts, so it would need to be in a volume instead of something like /tmp. Do you think this approach could work?

@king-jam
Copy link
Author

king-jam commented Jan 10, 2019

Your scenario is accurate. Would the probe also remove our ability to connect to the service to write the dump back in?

@ncdc
Copy link
Contributor

ncdc commented Jan 10, 2019

How are you planning on making your restore item action plugin talk to mysql? If you're going through the Service, then the probes would make that challenging. But you could exec into the pod itself and run the command there, perhaps.

@king-jam
Copy link
Author

Just approaching that now. Probably can't use the Service moving forward.

@king-jam
Copy link
Author

So the pattern can also match that of Restic. An init container with the mysql PVC mounted can start mysqld and run an io.Reader stream from GetObject directly into the mysql client @ localhost. Then when the "real" mysqld instance starts, the data will all be present in a "consistent" state. The Post Restore piece alludes to how Restic has it's ordering hardcoded into multiple places.

Thoughts?

@ncdc
Copy link
Contributor

ncdc commented Jan 22, 2019

The pod's init container would not have access to the credentials to talk to object storage, though?

I would like to spend some more time thinking about how we can allow users to store arbitrary data in object storage. And then retrieve it when restoring. That's going to be a real need, isn't it?

@king-jam
Copy link
Author

I agree....I tried to figure out if I could steal your plugin registry and create a gRPC client to talk to the object store but in the end have opted for pulling code from the cmd package to peak the cloud credentials and inject environment variables/secrets to the init container. Essentially reimplemented the persistence pkg.

The fact that the svc account is available to a plugin also means I can create k8s objects and opted to just skip the init and instead pause the entire backup progress by inspecting runtime.Unstructured and creating a copy temporary Pod. It mounts the same PVC so the actual Pod will see the data as if it took a restart when it finally comes up.

@king-jam
Copy link
Author

king-jam commented Jan 31, 2019

So following back up @ncdc

We got everything working, backup, restore, object storage handling, etc. Hid it behind a nice extensible interface; all that good stuff.

I would like to revisit this in terms of hooking the object store and the plugin concepts you discussed, it is something else to maintain and need to stay in sync with the BackupStorageLocations object.

The Post Restore piece does require a lot of thought since we want Ark to restore the Pod to a functional state, then do a check for plugins registered for Post Restore hooks that have an AppliesTo of pods. I think it's just another map of plugins and a for loop in the restore package but I fear I may be oversimplifying it.

@ncdc
Copy link
Contributor

ncdc commented Jan 31, 2019

@king-jam thanks for the update. We are knee-deep in project renaming work, but we should be able to come up for air starting next week. Thanks for being patient!

@connorearl
Copy link

I'm in the same boat on this one. I'm running into issues with mariadb-galera and having crash consistent backups. Any news on this issue?

@stale
Copy link

stale bot commented Jul 8, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Jul 8, 2021
@stale
Copy link

stale bot commented Jul 22, 2021

Closing the stale issue.

@stale stale bot closed this as completed Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement/User End-User Enhancement to Velero Icebox We see the value, but it is not slated for the next couple releases. Needs Product Blocked needing input or feedback from Product Reviewed Q2 2021 staled
Projects
None yet
Development

No branches or pull requests

5 participants