-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically manage restic repos #606
Conversation
return nil, errors.New("restic repository is not ready") | ||
} | ||
return repo, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might add a comment here that the rest of the function is for the apierrors.IsNotFound(err) == true
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
8b5bb78
to
2847d25
Compare
@@ -85,7 +84,7 @@ func (r *restorer) RestorePodVolumes(restore *arkv1api.Restore, pod *corev1api.P | |||
return nil | |||
} | |||
|
|||
repo, err := getReadyRepo(r.repoLister, restore.Namespace, pod.Namespace) | |||
repo, err := r.repoEnsurer.EnsureRepo(r.ctx, restore.Namespace, pod.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we ever end up creating a repo here? I'm thinking that even if we do, the data and buckets are in an invalid state already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to create the restic repo. We shouldn't be doing that as part of a restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How else would you handle the case where a user installs Ark in a fresh cluster and wants to restore an existing backup from another cluster with restic snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I wasn't thinking about the restore half of the equation (that's what happens when I try to wrap up a code review at the end of the day after a long meeting!).
func (b *backupper) BackupPodVolumes(backup *arkv1api.Backup, pod *corev1api.Pod, log logrus.FieldLogger) (map[string]string, []error) { | ||
// get volumes to backup from pod's annotations | ||
volumesToBackup := GetVolumesToBackup(pod) | ||
if len(volumesToBackup) == 0 { | ||
return nil, nil | ||
} | ||
|
||
repo, err := getReadyRepo(b.repoLister, backup.Namespace, pod.Namespace) | ||
repo, err := b.repoEnsurer.EnsureRepo(b.ctx, backup.Namespace, pod.Namespace) | ||
if err != nil { | ||
return nil, []error{err} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is executed in the daemonset, correct?
I ran into an issue yesterday that was partially my fault, but probably warrants a log message. If the daemonset exists, but no pods in it are ready, the restic backup process will still proceed and then hang waiting for the daemonset pod to finish working. In my testing, it waited for over 5 minutes and couldn't be stopped with ctrl-c, only a kill -9.
I can probably replicate if needed to walk through this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it through most of the PR but there's some files at the end I didn't have time for yet before EOD. But I wanted to submit now so you can see my comments on the repositoryEnsurer.
pkg/cmd/server/server.go
Outdated
// warn if restic daemonset does not exist | ||
if _, err := s.kubeClient.AppsV1().DaemonSets(s.namespace).Get(restic.DaemonSet, metav1.GetOptions{}); apierrors.IsNotFound(err) { | ||
s.logger.Warn("Ark restic daemonset not found; restic backups/restores will not work until it's created") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to log a warning if we got some other error?
return []error{err} | ||
} | ||
|
||
ctx, cancelFunc := context.WithTimeout(context.Background(), time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extract the time.Minute
as a const
pkg/restic/repository_ensurer.go
Outdated
} | ||
} | ||
|
||
func (r *repositoryEnsurer) onReady(name string) chan *arkv1api.ResticRepository { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would probably call this something else like newReadyChan
pkg/restic/repository_keys.go
Outdated
@@ -11,26 +29,27 @@ import ( | |||
const ( | |||
CredentialsSecretName = "ark-restic-credentials" | |||
CredentialsKey = "ark-restic-credentials" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change the name of this key inside the secret, don't you think?
pkg/restic/repository_keys.go
Outdated
@@ -11,26 +29,27 @@ import ( | |||
const ( | |||
CredentialsSecretName = "ark-restic-credentials" | |||
CredentialsKey = "ark-restic-credentials" | |||
EncryptionKey = "static-passw0rd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be exported?
pkg/restic/repository_keys.go
Outdated
secret := &corev1api.Secret{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Namespace: namespace, | ||
Name: CredentialsSecretName, | ||
}, | ||
Type: corev1api.SecretTypeOpaque, | ||
Data: map[string][]byte{ | ||
CredentialsKey: data, | ||
CredentialsKey: []byte(EncryptionKey), | ||
}, | ||
} | ||
|
||
_, err := secretClient.Secrets(namespace).Create(secret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a Get()
first to avoid more of #281
return errors.New("timed out waiting for cache to sync") | ||
} | ||
|
||
repo, err := rm.repoEnsurer.EnsureRepo(ctx, rm.namespace, snapshot.Repo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to create the restic repo. We shouldn't be doing that as part of deletions/GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question re: spinning up a new cluster, then having GC process backups that were synced in from an existing bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, my bad
@@ -85,7 +84,7 @@ func (r *restorer) RestorePodVolumes(restore *arkv1api.Restore, pod *corev1api.P | |||
return nil | |||
} | |||
|
|||
repo, err := getReadyRepo(r.repoLister, restore.Namespace, pod.Namespace) | |||
repo, err := r.repoEnsurer.EnsureRepo(r.ctx, restore.Namespace, pod.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to create the restic repo. We shouldn't be doing that as part of a restore.
pkg/restic/repository_ensurer.go
Outdated
readyChans map[string]chan *arkv1api.ResticRepository | ||
} | ||
|
||
func newRepositoryEnsurer(repoInformer arkv1informers.ResticRepositoryInformer, repoClient arkv1client.ResticRepositoriesGetter) *repositoryEnsurer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend changing the logic so we create a new one of these per *restic.backupper
. Then we don't need a lock or multiple channels and can instead just have a single channel.
I'm even tempted to say we can do away with the informer+channel. Move the initializeRepo
functionality from *resticRepositoryController
directly into the *restic.Backupper
or this repositoryEnsurer
. Then it's a lot simpler to implement EnsureRepo
:
- Try to get the repo.
- If it exists & is ready, done.
- If it exists & isn't ready, wait (?) or call
Check()
. Return result of wait/check - If it doesn't exist, try to create it
- If create succeeds, run the logic in
initializeRepo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to avoid having multiple channels, we'd need to have one of these per call to BackupPodVolumes
in the backupper, since a backup can have >1 namespace/repo. Not sure exactly what that would look like, since I don't think we'd want to be creating a new informer each time, but we can't be adding/removing event handlers while an existing one is running.
I'm not sure I like the idea of moving the repo init logic into the backupper or ensurer, but I think we need to talk it through in person.
Why don't you like what I did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look at it again with fresh eyes. I had to draw out the flow and the relationship between the components (ensurer vs repo controller). I was trying to find a way to have a single component responsible for the creation and initialization of the repo, instead of having that split across 2 (the ensurer that creates the ResticRepository, and the controller that initializes it in object storage and sets the phase to Ready).
@skriss does this look accurate for the EnsureRepo flow? |
Looks right to me |
c8cb556
to
c44a16a
Compare
Addressed items discussed in new commits; will do squashing post-review. |
LGTM |
Signed-off-by: Steve Kriss <steve@heptio.com>
c44a16a
to
22e8f23
Compare
squashed. |
Fixes #577
Builds on #570, ignore first commit.
Replaces
ark restic repo init
with automatic repository initialization, using a single static non-secret restic key.