Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1769030: Replacing operator creates duplicate secrets #1123

Conversation

Bowenislandsong
Copy link
Member

@Bowenislandsong Bowenislandsong commented Nov 12, 2019

Cause:
OLM catalog ensurer EnsureServiceAccount makes sure the service account
is updated when a new version of an operator is present. This
happens during ExecutePlan applying InstallPlan to a namespace.
If it is an update, fields of service account are updated but the
references to older secrets are dropped.

Consequence:
This process of dereferencing secret fails to clean up the older
secrets and result in the secrets pilling up as the operator upgrades.
Eventually, there will be too many old secrets laying around and only
getting cleaned up when the operator is uninstalled.

Fix:
We carry over older secrets through updating the service account.

Result:
Older secretes are again referred in the updated SA.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 12, 2019
@Bowenislandsong
Copy link
Member Author

e2e test inprog

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 13, 2019
@operator-framework operator-framework deleted a comment Nov 14, 2019
@operator-framework operator-framework deleted a comment Nov 14, 2019
@operator-framework operator-framework deleted a comment from h777xx Nov 14, 2019
@operator-framework operator-framework deleted a comment Nov 14, 2019
@operator-framework operator-framework deleted a comment Nov 14, 2019
@Bowenislandsong Bowenislandsong force-pushed the bugfix_dup_secrets branch 2 times, most recently from cadbe2d to c43b2ee Compare November 18, 2019 15:28
@Bowenislandsong Bowenislandsong changed the title [wip] Bug 1769030 Replacing (updating) operator creates duplicate secrets f… Bug 1769030 Replacing operator creates duplicate secrets Nov 18, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2019
sa.SetNamespace(namespace)
if _, updateErr := o.kubeClient.UpdateServiceAccount(sa); updateErr != nil {
err = errorwrap.Wrapf(updateErr, "error updating service account: %s", sa.GetName())
return
}

for _, secret := range preSa.Secrets {
foregroundDelete := metav1.DeletePropagationForeground
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think doing a foreground type of deletion is necessary if there's nothing left referencing these secrets as dependencies, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This is unnecessary but I thought it would not hurt. Do you think we should just leave it blank then? Thanks

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you should have nothing passed into the DeleteOptions below.

@@ -116,12 +116,35 @@ func (o *StepEnsurer) EnsureServiceAccount(namespace string, sa *corev1.ServiceA
return
}

// Before UpdateServiceAccount and dereference secrets of the older SA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the original bug report have this issue with the same serviceaccount name?

From line 114, this section should only be hit if the serviceaccount doesn't exist at all, so I'm not convinced this will fix the bug.

I think the issue may be that Create on ServiceAccount always creates a secret even if the serviceaccount exists? Maybe we should start with a Get before the initial Create?

Do you agree, or am I missing how this fixes the issue?

Copy link
Member Author

@Bowenislandsong Bowenislandsong Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have missed !k8serrors.IsAlreadyExists on line 114 which reports any SA creation problem other than SA exists. Therefore, the only scenario where my addition hits is if the SA does exists.

The logic of the original code got me at first as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I get for reviewing too quickly 😄 Ignore my previous comment.

However - I really don't think we should be in the business of deleting the serviceaccount secrets.

I did some testing, and the issue is that when we Update, our SA definition doesn't have the secrets set, which removes the configured secret and causes the SA controller to generate a new one:

$ k -n default create serviceaccount test
serviceaccount/test created

$ k -n default get secrets
NAME                  TYPE                                  DATA   AGE
default-token-67bw8   kubernetes.io/service-account-token   3      3m23s
test-token-9lx4p      kubernetes.io/service-account-token   3      3s

# an update that doesn't cause the bug
$ k -n default label serviceaccounts test with=test
serviceaccount/test labeled

# no duplicate secrets
$ k -n default get secrets
NAME                  TYPE                                  DATA   AGE
default-token-67bw8   kubernetes.io/service-account-token   3      6m40s
test-token-9lx4p      kubernetes.io/service-account-token   3      3m20s

# an update that causes the bug - remove the secrets block
$ k -n default edit serviceaccounts test
serviceaccount/test edited

# now there are duplicates
$ k -n default get secrets
NAME                  TYPE                                  DATA   AGE
default-token-67bw8   kubernetes.io/service-account-token   3      8m3s
test-token-9lx4p      kubernetes.io/service-account-token   3      4m43s
test-token-xwmpg      kubernetes.io/service-account-token   3      4s

The solution here shouldn't be to delete the extra secrets, it should be to not create the extra secrets in the first place.

A quick option for now would be to:

  • Get the current ServiceAccount (if it exists)
  • Add the current secret list to the object from the step
  • Then call update with that object - it will have the secret list so a new one won't be generated

But it would be even better to use DeepDerivative or server-side apply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my other thought on the fix too. I eventually chose this to make sure we always have the right token in the secrets. I worry that secrets may change in the install plan rendering old secrets to be invalid. However I might be worrying for nothing. I’ll make a commit to use that approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ecordell I liked the server-side approach but it being in beta gives me goosebumps. I am not entirely sure about your suggestion about DeepDerivative. If you are talking about https://github.com/kubernetes/kubernetes/blob/6be19d85f206d64044ae940540a437c65675a526/third_party/forked/golang/reflect/deep_equal.go#L378 then I do not see how we can use it. If you are talking about the concept. Not every field can be copied (ie: timestamp, etc). If we copy every unset field in the new SA from the existing SA, we need to know exactly what is not copiable. I do not think my understanding of your DeepDerivative is correct. Therefore, please let me know what you are explicitly suggesting. Thank you!

@Bowenislandsong
Copy link
Member Author

/hold
E2e test includes pulling quay image. This should be removed before merging.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2019
@Bowenislandsong
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Collaborator

@Bowenislandsong: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bowenislandsong Bowenislandsong changed the title Bug 1769030 Replacing operator creates duplicate secrets Bug 1769030: Replacing operator creates duplicate secrets Nov 19, 2019
@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Nov 19, 2019
@openshift-ci-robot
Copy link
Collaborator

@Bowenislandsong: This pull request references Bugzilla bug 1769030, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1769030: Replacing operator creates duplicate secrets

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bowenislandsong Bowenislandsong force-pushed the bugfix_dup_secrets branch 2 times, most recently from 98082e7 to 2fc7268 Compare November 19, 2019 21:21
@Bowenislandsong
Copy link
Member Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2019
@awgreene
Copy link
Member

@Bowenislandsong you may want to use deepderivative to compare the objects.

@ecordell
Copy link
Member

/bugzilla refresh

@openshift-ci-robot
Copy link
Collaborator

@ecordell: This pull request references Bugzilla bug 1769030, which is valid.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ecordell
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 27, 2019
@Bowenislandsong
Copy link
Member Author

Bowenislandsong commented Nov 27, 2019

/cc @ecordell @awgreene added DeepDerivative check to save API calls if updated SA has the exact same fields.

@ecordell
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2019
…or the operator's ServiceAccount

Cause:
OLM catalog ensurer EnsureServiceAccount makes sure the service account
is updated when a new version of an operator is present. This
happens during ExecutePlan applying InstallPlan to a namespace.
If it is an update, fields of service account are updated but the
references to older secrets are dropped.

Consequence:
This process of dereferencing secret fails to clean up the older
secrets and result in the secrets pilling up as the operator upgrades.
Eventually, there will be too many old secrets laying around and only
getting cleaned up when the operator is uninstalled.

Fix:
We carry over older secrets through updating the service account.
We also compare the update using DeepDerivative to see if the
update changes any existing fields. If not, we skip the update API call
since it will not change anything.

Result:
Older secretes are again referred in the updated SA and no new secrets
are created.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2019
Copy link
Member

@gallettilance gallettilance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, Bowenislandsong, ecordell, gallettilance

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bowenislandsong
Copy link
Member Author

/test e2e-gcp-upgrade
/test e2e-gcp

@openshift-merge-robot openshift-merge-robot merged commit 4c7479c into operator-framework:master Dec 2, 2019
@openshift-ci-robot
Copy link
Collaborator

@Bowenislandsong: All pull requests linked via external trackers have merged. Bugzilla bug 1769030 has been moved to the MODIFIED state.

In response to this:

Bug 1769030: Replacing operator creates duplicate secrets

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@Bowenislandsong: new pull request created: #1159

In response to this:

/cherry-pick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@Bowenislandsong: new pull request created: #1160

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@Bowenislandsong: cannot checkout release-4.2 cancel: error checking out release-4.2 cancel: exit status 1. output: error: pathspec 'release-4.2 cancel' did not match any file(s) known to git.

In response to this:

/cherry-pick release-4.2 cancel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@Bowenislandsong: #1123 failed to apply on top of branch "release-4.1":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	pkg/controller/operators/catalog/operator_test.go
A	pkg/controller/operators/catalog/step_ensurer.go
M	test/e2e/installplan_e2e_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/installplan_e2e_test.go
CONFLICT (content): Merge conflict in test/e2e/installplan_e2e_test.go
CONFLICT (modify/delete): pkg/controller/operators/catalog/step_ensurer.go deleted in HEAD and modified in Bug 1769030 Replacing (updating) operator creates duplicate secrets for the operator's ServiceAccount. Version Bug 1769030 Replacing (updating) operator creates duplicate secrets for the operator's ServiceAccount of pkg/controller/operators/catalog/step_ensurer.go left in tree.
Auto-merging pkg/controller/operators/catalog/operator_test.go
CONFLICT (content): Merge conflict in pkg/controller/operators/catalog/operator_test.go
Patch failed at 0001 Bug 1769030 Replacing (updating) operator creates duplicate secrets for the operator's ServiceAccount

In response to this:

/cherry-pick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants