-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare for duplicates before logging object exists errors #405
Conversation
pkg/restore/restore.go
Outdated
if equality.Semantic.DeepEqual(obj, fromCluster) { | ||
continue | ||
} else { | ||
addToResult(&warnings, namespace, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skriss Should we perhaps wrap or append to the error message here to say that the object exists and differs from the backup, so it's more clear what's happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah probably makes sense to wrap
pkg/restore/restore.go
Outdated
|
||
// add an ark-restore label to each resource for easy ID | ||
addLabel(obj, api.RestoreLabelKey, ctx.restore.Name) | ||
|
||
ctx.infof("Restoring %s: %v", obj.GroupVersionKind().Kind, obj.GetName()) | ||
_, err = resourceClient.Create(obj) | ||
if apierrors.IsAlreadyExists(err) { | ||
addToResult(&warnings, namespace, err) | ||
continue | ||
fromCluster, _ := resourceClient.Get(obj.GetName(), metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to handle error
pkg/restore/restore.go
Outdated
continue | ||
fromCluster, _ := resourceClient.Get(obj.GetName(), metav1.GetOptions{}) | ||
// Remove insubstantial metadata | ||
fromCluster, _ = resetMetadataAndStatus(fromCluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to handle error
pkg/restore/restore.go
Outdated
addLabel(fromCluster, api.RestoreLabelKey, ctx.restore.Name) | ||
|
||
if equality.Semantic.DeepEqual(obj, fromCluster) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably makes sense to do a ctx.infof(...)
here stating that the object already exists with matching state in-cluster
pkg/restore/restore.go
Outdated
if equality.Semantic.DeepEqual(obj, fromCluster) { | ||
continue | ||
} else { | ||
addToResult(&warnings, namespace, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah probably makes sense to wrap
3254d25
to
990be21
Compare
pkg/restore/restore.go
Outdated
@@ -427,6 +428,41 @@ func (ctx *context) restoreFromDir(dir string) (api.RestoreResult, api.RestoreRe | |||
return warnings, errs | |||
} | |||
|
|||
// resolveConflict tries to resolve conflicts between a backed up object | |||
// and it's representation in the cluster, deciding on what action to take |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its
pkg/restore/restore.go
Outdated
@@ -427,6 +428,41 @@ func (ctx *context) restoreFromDir(dir string) (api.RestoreResult, api.RestoreRe | |||
return warnings, errs | |||
} | |||
|
|||
// resolveConflict tries to resolve conflicts between a backed up object | |||
// and it's representation in the cluster, deciding on what action to take | |||
// based on type. Errors are wrapped with some additional data and returned to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need the comment about errors?
pkg/restore/restore.go
Outdated
// and it's representation in the cluster, deciding on what action to take | ||
// based on type. Errors are wrapped with some additional data and returned to | ||
// the caller to handle. If the items are equivalent, nil is returned | ||
func (ctx *context) resolveConflict(fromBackup *unstructured.Unstructured, resourceClient client.Dynamic, restoreErr error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a small function that takes 2 objects and tells us if they're equal. Ideally it should just return a bool
, although it may need to return (bool, error)
because resetMetadataAndStatus
can return an error.
We could have another function that retrieves fromCluster
and then calls the helper above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Perhaps resolveConflict
would be the top level that gets fromCluster
, cleans it up, and passes it down?
pkg/restore/restore.go
Outdated
// Do some initial work to get the cluster version for comparison | ||
fromCluster, err := resourceClient.Get(fromBackup.GetName(), metav1.GetOptions{}) | ||
if err != nil { | ||
es := fmt.Sprintf("Conflict found for %s: %v, but could not retrieve cluster verison: %s", fromBackup.GroupVersionKind(), fromBackup.GetName(), err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: if you need to format an error string, do errors.Errorf
pkg/restore/restore.go
Outdated
if err != nil { | ||
es := fmt.Sprintf("Conflict found for %s: %v, but could not retrieve cluster verison: %s", fromBackup.GroupVersionKind(), fromBackup.GetName(), err.Error()) | ||
err = errors.New(es) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably omit the format string and just return errors.WithStack(err)
pkg/restore/restore.go
Outdated
fromCluster, err = resetMetadataAndStatus(fromCluster) | ||
if err != nil { | ||
es := fmt.Sprintf("Conflict found for %s: %v, but could not reset metadata for cluster verison: %s", fromBackup.GroupVersionKind(), fromBackup.GetName(), err.Error()) | ||
err = errors.New(es) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably omit the format string and just return err
pkg/restore/restore.go
Outdated
if !equality.Semantic.DeepEqual(fromBackup, fromCluster) { | ||
es := fmt.Sprintf("%s and backup does not match.", restoreErr.Error()) | ||
err := errors.New(es) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I'd return false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you return false, nil
with a (bool, err)
return signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
pkg/restore/restore.go
Outdated
|
||
// add an ark-restore label to each resource for easy ID | ||
addLabel(obj, api.RestoreLabelKey, ctx.restore.Name) | ||
|
||
ctx.infof("Restoring %s: %v", obj.GroupVersionKind().Kind, obj.GetName()) | ||
_, err = resourceClient.Create(obj) | ||
if apierrors.IsAlreadyExists(err) { | ||
addToResult(&warnings, namespace, err) | ||
err = ctx.resolveConflict(obj, resourceClient, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this returns (bool, err)
, I'd store it in a different error to differentiate from the already-exists error. I would log the error that comes back from this call, if we get one. Then, if we got an error here, or the bool
return value is false (objects not equal), call addToResult
.
Here's my thinking: if we run into any problems trying to determine equality, it's better to record those problems in the restore log file, and record the conflict in the restore result warnings. Would you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I would - I actually had resolveConflict
returning (bool, err)
initially, but changed it to error
only as I was only switching on error states. I was also doing _, restoreErr := resourceClient.Create(obj)
in order to keep it separate.
I'll return to that structure.
pkg/restore/restore.go
Outdated
equal, err := ctx.objectsAreEqual(obj, resourceClient, restoreErr) | ||
// Log any errors trying to resolve the conflict; nil means no conflic | ||
if err != nil { | ||
addToResult(&warnings, namespace, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be ctx.infof
so it goes into the restore log file.
pkg/restore/restore.go
Outdated
addToResult(&warnings, namespace, err) | ||
} | ||
if equal { | ||
ctx.infof("Backup for %s: %v matches cluster version, skipping restore.", obj.GroupVersionKind(), obj.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need this log line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not particularly attached to it. Fine with deleting it.
pkg/restore/restore.go
Outdated
if equal { | ||
ctx.infof("Backup for %s: %v matches cluster version, skipping restore.", obj.GroupVersionKind(), obj.GetName()) | ||
} else { | ||
e := fmt.Errorf("%s and backup does not match.", restoreErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this error text end up looking like - do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a snippet from a restore:
Warnings:
Ark: <none>
Cluster: persistentvolumes "pvc-c5c8bb8c-3758-11e8-a959-42010a960060" already exists and backup does not match.
persistentvolumes "pvc-f07ce647-336d-11e8-9d79-42010a96006f" already exists and backup does not match.
Namespaces:
default: services "kubernetes" already exists and backup does not match.
kube-system: endpoints "kube-controller-manager" already exists and backup does not match.
endpoints "kube-scheduler" already exists and backup does not match.
services "default-http-backend" already exists and backup does not match.
services "heapster" already exists and backup does not match.
services "kube-dns" already exists and backup does not match.
services "kubernetes-dashboard" already exists and backup does not match.
nginx-example: services "my-nginx" already exists and backup does not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
pkg/restore/restore.go
Outdated
continue | ||
} | ||
if err != nil { | ||
if restoreErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment indicating that if we get here, we got some error other than already-exists
pkg/restore/restore.go
Outdated
addLabel(fromCluster, api.RestoreLabelKey, restoreName) | ||
|
||
// If there are no specific actions needed based on the type, simply check for equality. | ||
if !equality.Semantic.DeepEqual(fromBackup, fromCluster) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return equality.Semantic.DeepEqual(fromBackup, fromCluster), nil
pkg/restore/restore.go
Outdated
@@ -427,6 +428,18 @@ func (ctx *context) restoreFromDir(dir string) (api.RestoreResult, api.RestoreRe | |||
return warnings, errs | |||
} | |||
|
|||
// objectsAreEqual indentifies whether a backed up object | |||
// and its representation in the cluster are equal | |||
func (ctx *context) objectsAreEqual(fromBackup *unstructured.Unstructured, resourceClient client.Dynamic, restoreErr error) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to pass in restoreErr
any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for going back and forth on this, but I'm wondering if it would just be clearer to move the Get()
into the IsAlreadyExists
block and get rid of this helper. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My very first pass did that. The reason I named this function resolveConflict
was because I envisioned adding more complex cases like service accounts there later. That way, the resolution was mostly out of the IsAlreadyExists
block.
Doing the Get()
in the block seems reasonable to me, and some function that contains different actions or policy checks added when it's needed.
pkg/restore/restore.go
Outdated
_, restoreErr := resourceClient.Create(obj) | ||
if apierrors.IsAlreadyExists(restoreErr) { | ||
fromCluster, err := resourceClient.Get(obj.GetName(), metav1.GetOptions{}) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of something like this:
equal := false
if fromCluster, err := resourceClient.Get(obj.GetName(), metav1.GetOptions{}); err == nil {
equal, err = objectsAreEqual(fromCluster, obj)
if err != nil {
// Log any errors trying to check equality
ctx.infof(err.Error())
}
} else {
ctx.infof("error getting from-cluster...")
}
if !equal {
e := fmt.Errorf("%s and backup does not match.", restoreErr)
addToResult(&warnings, namespace, e)
}
This way, we only call objectsAreEqual
if we successfully retrieve fromCluster
. No matter what, if we get an error anywhere in here, it gets logged, and we add the warning in all cases unless the objects are equal. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense.
pkg/restore/restore_test.go
Outdated
} | ||
|
||
for _, test := range tests { | ||
res, err := objectsAreEqual(test.clusterObj, test.backupObj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run in a subtest: t.Run(test.name, func(t *testing.T) { ... })
pkg/restore/restore_test.go
Outdated
@@ -897,6 +936,15 @@ func (obj *testUnstructured) WithName(name string) *testUnstructured { | |||
return obj.WithMetadataField("name", name) | |||
} | |||
|
|||
func (obj *testUnstructured) WithArkLabel(restoreName string) *testUnstructured { | |||
ls := make(map[string]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would check if the obj
already has labels and merge if it does.
clusterObj: NewTestUnstructured().Unstructured, | ||
expectedErr: true, | ||
expectedRes: false, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth getting the json output from a real cluster of something like a secret or service or pod and putting that in here as a test case. We have a test helper to convert json to Unstructured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -722,6 +740,23 @@ func (ctx *context) executePVAction(obj *unstructured.Unstructured) (*unstructur | |||
return updated2, nil | |||
} | |||
|
|||
// objectsAreEqual takes two unstructured objects and checks for equality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment that fromCluster
is mutated
pkg/restore/restore.go
Outdated
ctx.infof("Error retrieving cluster version of %s: %v", obj.GetName(), err) | ||
} | ||
if !equal { | ||
e := fmt.Errorf("%s and backup does not match.", restoreErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to use errors.Errorf
(from github.com/pkg/errors
) wherever possible.
e0b31b5
to
1e84cae
Compare
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking pretty good to me
pkg/restore/restore.go
Outdated
equal, err = objectsAreEqual(fromCluster, obj) | ||
// Log any errors trying to check equality | ||
if err != nil { | ||
ctx.infof(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably make this something like ctx.infof("error checking %s for equality: %v", obj.getName(), err)
pkg/restore/restore.go
Outdated
ctx.infof("Error retrieving cluster version of %s: %v", obj.GetName(), err) | ||
} | ||
if !equal { | ||
e := errors.Errorf("%s and backup does not match.", restoreErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wordsmithing here, but since this is directly part of the UX it's important to make this super clear - maybe something like " not restored: , and is different than backed-up version." I think it's important to make clear that it wasn't restored, because it already exists, and there's a diff (so the user might want to manually remediate).
pkg/restore/restore.go
Outdated
@@ -722,6 +740,25 @@ func (ctx *context) executePVAction(obj *unstructured.Unstructured) (*unstructur | |||
return updated2, nil | |||
} | |||
|
|||
// objectsAreEqual takes two unstructured objects and checks for equality. | |||
// The fromCluster object is copy that is mutated to remove any insubstantial runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove is copy that
When restoring resources that raise an already exists error, check their equality before logging a message on the restore. If they're the same except for some metadata, don't generate a message. The restore process was modified so that if an object had an empty namespace string, no namespace key is created on the object. This was to avoid manipulating the copy of the current cluster's object by adding the target namespace. There are some cases right now that are known to not be equal via this method: - The `default` ServiceAccount in a namespace will not match, primarily because of differing default tokens. These will be handled in their own patch - IP addresses for Services are recorded in the backup object, but are either not present on the cluster object, or different. An issue for this already exists at vmware-tanzu#354 - Endpoints have differing values for `renewTime`. This may be insubstantial, but isn't currently handled by the resetMetadataAndStatus function. - PersistentVolume objects do not match on spec fields, such as claimRef and cloud provider persistent disk info Signed-off-by: Nolan Brubaker <nolan@heptio.com>
LGTM! |
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.
The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.
There are some cases right now that are known to not be equal via this
method:
default
ServiceAccount in a namespace will not match, primarilybecause of differing default tokens. These will be handled in their own
patch
either not present on the cluster object, or different. An issue for
this already exists at Should service action (restore) remove all node ports from a service? #354
renewTime
. This may beinsubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
claimRef
section,while those in cluster do not.
Signed-off-by: Nolan Brubaker nolan@heptio.com