fix resource creation/deletion after operator group config change #675

jpeeler · 2019-01-18T20:34:18Z

This fixes scenario(s) where CSVs were not being copied.

Commit message:
This changes the operator group target namespace format specifically
when watching all namespaces. Instead of only including "", the
annotation has been updated to additionally include all the namespaces
seen. The new behavior fixes syncing issues when a new namespace is
created after the initial CSVs have been copied, again only when
watching all namespaces.
Doesn't change format, just fixes OperatorGroup managed resources.

Requeuing changes:
When namespaces are updated, matching operator groups are requeued.
When operator group annotations change, CSVs in that namspace are
requeued.

ALM-665

jpeeler · 2019-01-18T21:32:12Z

/retest
And so it begins.

jpeeler · 2019-01-18T22:01:04Z

/retest

ecordell · 2019-01-21T13:24:46Z

pkg/controller/operators/olm/operatorgroup.go

 	} else {
+		if selector == nil || selector.Empty() {
+			selector = labels.Everything()
+			// set is sorted after returning, so being at the beginning can be relied upon


not sure I follow this comment - where are we using order of the set? (asking because maps don't retain any ordering)

Right, I should have specifically said slice instead. It's after being converted from a set to a slice in updateNamespaceList.

ecordell · 2019-01-21T13:29:22Z

pkg/controller/operators/olm/operatorgroup.go

@@ -58,16 +58,25 @@ func (a *Operator) syncOperatorGroups(obj interface{}) error {
 	a.Log.Debug("Cluster roles completed")

 	for _, csv := range a.csvSet(op.Namespace, v1alpha1.CSVPhaseAny) {
-		origCSVannotations := csv.GetAnnotations()
+		origCSVannotations := a.copyOperatorGroupAnnotations(&csv.ObjectMeta)


why copy out the just the three annotations and then immediately set just those three below with addOperatorGroupAnnotations? If we aren't going to retain the other annotation values, then we can just directly build the set of annotations here and not use these copyOperatorGroupAnnotations/addOperatorGroupAnnotations methods?

The idea here wasn't to change the logic, but to fix the bug of not detecting annotation changes due to getting a reference instead of a copy. The new copyOperatorGroupAnnotations method ensures a copy is done so that the reflection test below works properly.

ecordell · 2019-01-21T13:44:02Z

pkg/controller/operators/olm/operator.go

@@ -310,6 +314,32 @@ func (a *Operator) syncObject(obj interface{}) (syncError error) {
 		a.requeueOwnerCSVs(metaObj)
 	}

+	namespace, ok := obj.(*corev1.Namespace)


I think this may be too aggressive for resyncing.

The cases where we know we need to resync:

namespace is added/deleted

namespace's labels updated

This makes me think we may want to detect those cases directly by passing in specific handlers for those events.

If we're looking at all three add/update/delete, is having separate handlers for each better? Or perhaps you're referring to something more granular that I don't know about?

the thing I really want to omit from processing is the case where the namespace is synced periodically, or when there has been some non-label-related change to a namespace

If no namespace processing is done then the scenario I described in #675 (comment) will only eventually work, not instantly as I was hoping could be the case. Is that acceptable?

ecordell · 2019-01-21T13:47:34Z

pkg/controller/operators/olm/operator.go

+			return
+		}
+		for _, opGroup := range operatorGroupList {
+			namespaceMap, err := a.getMatchingNamespaces(opGroup)


We shouldn't need to list all of the namespaces of the operator group to know if the one we're currently syncing should requeue an operatorgroup? Just need to compare the namespace list (in spec of opgroup) or the label selector on the operatorgroup / labels of the namespace.

We probably already have a function somewhere that does this, but don't we just need to do something like if ShouldContain(opGroup, namespace) { resync opgroup }?

The resyncing here is specifically to make an operator group status contain a properly updated namespace list. You are right though that less work is necessary here, so I'll look into fixing that.

I take back what I said about less work being necessary. I think it should be left as is.

ecordell · 2019-01-21T13:49:23Z

pkg/api/apis/operators/v1alpha1/clusterserviceversion.go

@@ -98,7 +98,7 @@ func NewInstallModeSet(modes []InstallMode) (InstallModeSet, error) {
 // the given operatorNamespace and list of target namespaces.
 func (set InstallModeSet) Supports(operatorNamespace string, namespaces []string) error {
 	numNamespaces := len(namespaces)
-	if !set[InstallModeTypeAllNamespaces] && numNamespaces == 1 && namespaces[0] == v1.NamespaceAll {
+	if !set[InstallModeTypeAllNamespaces] && numNamespaces > 0 && namespaces[0] == v1.NamespaceAll {


Can you explain this change? Why does it help to list additional namespaces for NamespaceAll?

The exact scenario that is solved by doing this is after an operator group's status is written that is watching all namespaces, a new namespace creation isn't "noticed" without specifically listing all the namespaces seen.

ecordell · 2019-01-21T13:50:25Z

pkg/controller/operators/olm/operatorgroup.go

@@ -293,7 +302,7 @@ func (a *Operator) ensureTenantRBAC(operatorNamespace, targetNamespace string, c

 func (a *Operator) copyCsvToTargetNamespace(csv *v1alpha1.ClusterServiceVersion, operatorGroup *v1alpha2.OperatorGroup) error {
 	namespaces := make([]string, 0)
-	if len(operatorGroup.Status.Namespaces) == 1 && operatorGroup.Status.Namespaces[0] == corev1.NamespaceAll {
+	if operatorGroup.Status.Namespaces[0] == corev1.NamespaceAll {


It would be nice if we didn't have to implicitly rely on list ordering here

I thought this was better than searching the string each time to see if "" was in the list. I couldn't figure out any naming issues with sorting not causing "" to be first.

njhale

Awesome overall. Just a few small things:

pkg/controller/operators/olm/operator.go

Not relevant in the context of the proposed solution.

jpeeler · 2019-01-25T00:10:43Z

I've reworked this to NOT change the namespace status format in the case of watching all namespaces. I've verified that it works in the additional namespace scenario I've been testing with. However, it does not properly remove a CSV if a namespace is removed from an existing operator group (which I assume does not work properly with the current code either). Perhaps I should add these scenarios as e2es.

jpeeler · 2019-01-28T18:00:25Z

/retest

openshift-ci-robot · 2019-01-28T21:56:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpeeler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jpeeler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jpeeler · 2019-01-30T21:19:03Z

pkg/controller/operators/olm/operatorgroup.go

@@ -282,6 +282,16 @@ func (a *Operator) ensureTenantRBAC(operatorNamespace, targetNamespace string, c
 				return err
 			}
 		}
+		for _, ns := range pruneNamespaces {


These RBAC changes and the ones below seem to be unnecessary. Why would that be?

pkg/controller/operators/olm/operatorgroup.go

ecordell · 2019-01-30T23:49:07Z

pkg/controller/operators/olm/operatorgroup.go

@@ -58,16 +58,34 @@ func (a *Operator) syncOperatorGroups(obj interface{}) error {
 	a.Log.Debug("Cluster roles completed")

 	for _, csv := range a.csvSet(op.Namespace, v1alpha1.CSVPhaseAny) {
-		origCSVannotations := csv.GetAnnotations()
+		origCSVannotations := a.copyOperatorGroupAnnotations(&csv.ObjectMeta)
 		a.addOperatorGroupAnnotations(&csv.ObjectMeta, op, !csv.IsCopied())
 		if reflect.DeepEqual(origCSVannotations, csv.GetAnnotations()) == false {


Will this do the right thing if there are other annotations on the csv? (Non-operatorgrou-related)

My previous comment made no sense. origCSVannotations is only used for comparing to operator group annotations. So it does do the right thing.

ecordell · 2019-01-30T23:50:47Z

pkg/controller/operators/olm/operatorgroup.go

 		if err != nil {
 			return err
 		}
-		for _, ns := range namespaceObjs {
-			namespaces = append(namespaces, ns.GetName())
+		for _, csv := range fetchedCSVs {


I think we should also verify that the CSV status is “copied” just to be safe

alecmerdler · 2019-01-31T18:30:43Z

@jpeeler What does this PR still need before I give it a review?

jpeeler · 2019-01-31T21:15:17Z

@alecmerdler sorry missed your comment. just pushed a new revision, should be good to go at this point.

pkg/controller/operators/olm/operator.go

njhale · 2019-02-01T13:33:01Z

pkg/controller/operators/olm/operatorgroup.go

 		a.addOperatorGroupAnnotations(&csv.ObjectMeta, op, !csv.IsCopied())
 		if reflect.DeepEqual(origCSVannotations, csv.GetAnnotations()) == false {
 			// CRDs don't support strategic merge patching, but in the future if they do this should be updated to patch
 			if _, err := a.client.OperatorsV1alpha1().ClusterServiceVersions(csv.GetNamespace()).Update(csv); err != nil {
 				a.Log.Errorf("Update for existing CSV failed: %v", err)
+			} else {
+				if _, ok := origCSVannotations[v1alpha2.OperatorGroupAnnotationKey]; ok {
+					if err := a.csvQueueSet.Requeue(csv.GetName(), csv.GetNamespace()); err != nil {


Updating ObjectMeta on the CSV will enqueue it. I don't think you need to explicitly requeue.

This specifically handles the case when an operator group related annotation update hasn't occurred, but CSVs still need to be copied to newly created namespaces.

jpeeler · 2019-02-04T19:20:35Z

/retest

jpeeler · 2019-02-05T16:29:54Z

/hold
Even though this is all green, I think there's some test ordering that will still cause test failure.

jpeeler · 2019-02-06T18:21:07Z

I believe the e2e issue I saw earlier has been fixed now (here's for hoping my local success is replicated). Not removing the hold though until I look over #701.

This adds requeuing for both the namespace and operator group sync loops, which fixes syncing issues after initial CSVs have been copied.

This is a change necessary due to newly introduced functionality, but it fixes a problem and ultimately is cleaner anyway. Now that there exists the possibility of "invalid" CSVs being copied over existing valid CSVs, save the copy step until a final CSV state has been reached.

jpeeler · 2019-02-25T16:32:18Z

Please focus attention on syncClusterServiceVersion as the CSV copying behavior has been (necessarily) changed due to the rebase changes.

jpeeler · 2019-02-28T16:55:30Z

This needs to be rebased and we talked about instead of only copying CSVs when they enter the final state (in the last commit) to add validation beforehand instead.

openshift-ci-robot · 2019-03-03T09:09:29Z

@jpeeler: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sferich888 · 2019-03-04T20:12:15Z

@jpeeler can we close this in favor of #736

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 18, 2019

openshift-ci-robot requested review from alecmerdler and njhale January 18, 2019 20:34

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 18, 2019

ecordell reviewed Jan 21, 2019

View reviewed changes

njhale previously requested changes Jan 24, 2019

View reviewed changes

pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved

jpeeler force-pushed the requeue-fixes branch from 242bfb6 to 376ab9d Compare January 25, 2019 00:07

jpeeler force-pushed the requeue-fixes branch from 376ab9d to a000b6e Compare January 28, 2019 21:56

jpeeler changed the title ~~fix(olm): change operator group target namespace format~~ fix resource creation/deletion after operator group config change Jan 30, 2019

jpeeler commented Jan 30, 2019

View reviewed changes

pkg/controller/operators/olm/operatorgroup.go Outdated Show resolved Hide resolved

jpeeler commented Jan 30, 2019

View reviewed changes

pkg/controller/operators/olm/operatorgroup.go Outdated Show resolved Hide resolved

ecordell reviewed Jan 30, 2019

View reviewed changes

jpeeler force-pushed the requeue-fixes branch from 819fb7b to c8c53f3 Compare January 31, 2019 21:14

jpeeler force-pushed the requeue-fixes branch from c8c53f3 to e51e10f Compare February 1, 2019 02:07

njhale reviewed Feb 1, 2019

View reviewed changes

pkg/controller/operators/olm/operator.go Outdated Show resolved Hide resolved

njhale reviewed Feb 1, 2019

View reviewed changes

jpeeler force-pushed the requeue-fixes branch from e51e10f to 4f45ae7 Compare February 4, 2019 20:10

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 5, 2019

jpeeler mentioned this pull request Feb 7, 2019

Make e2e more robust #703

Merged

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2019

Jeff Peeler added 3 commits February 25, 2019 11:27

fix(olm): change operator group / CSV requeuing

a8e422c

This adds requeuing for both the namespace and operator group sync loops, which fixes syncing issues after initial CSVs have been copied.

fix(olm): delete stale resources after op group change

54d2783

jpeeler force-pushed the requeue-fixes branch from 352e905 to 9a7fb02 Compare February 25, 2019 16:27

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 25, 2019

ecordell mentioned this pull request Mar 2, 2019

OperatorGroup expansion/contraction #736

Merged

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2019

jpeeler closed this Mar 8, 2019

fix resource creation/deletion after operator group config change #675

fix resource creation/deletion after operator group config change #675

Conversation

jpeeler commented Jan 18, 2019 • edited by njhale Loading

jpeeler commented Jan 18, 2019

jpeeler commented Jan 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhale left a comment

Choose a reason for hiding this comment

jpeeler commented Jan 25, 2019

jpeeler commented Jan 28, 2019

openshift-ci-robot commented Jan 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpeeler Jan 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alecmerdler commented Jan 31, 2019

jpeeler commented Jan 31, 2019

Choose a reason for hiding this comment

jpeeler Feb 4, 2019 • edited Loading

Choose a reason for hiding this comment

jpeeler commented Feb 4, 2019

jpeeler commented Feb 5, 2019

jpeeler commented Feb 6, 2019

jpeeler commented Feb 25, 2019

jpeeler commented Feb 28, 2019

openshift-ci-robot commented Mar 3, 2019

sferich888 commented Mar 4, 2019

jpeeler commented Jan 18, 2019 •

edited by njhale

Loading

jpeeler Jan 31, 2019 •

edited

Loading

jpeeler Feb 4, 2019 •

edited

Loading