Skip to content

sukhil-suresh/openshift-operator-bug

Repository files navigation

openshift-operator-bug

Sample app to demo a Red Hat helm-controller bug on OpenShift.

Bug report - operator-framework/operator-sdk#6494

Summary

When an Operand upgrade fails and the bundled helm chart has one or more PersistentVolumeClaim and ServiceAccount, the secrets generated by the openshift-controller-manager for each ServiceAccount increase constantly until a successful Operand update is made.

This was observed on OpenShift 4.12 with operator-sdk version v1.28.0-ocp.

Observations

  1. The secrets that increase over time are not defined by the helm chart. OpenShift automatically generates 2 secrets for every service account - the API token secret and the dockercfg secret with creds to access the Red Hat container registry. The openshift-controller-manager creates the secrets and updates the spec.imagePullSecrets and spec.secrets fields of the service account with the dockercfg secret reference.

  2. Any change made to the Operand is applied as a helm upgrade by the Red Hat helm-controller using the helm chart. When the helm upgrade fails, the helm-controller attempts a helm rollback which results in thespec.imagePullSecrets and spec.secrets of the service account being cleared since they are not part of the manifests rendered by the helm chart. Clearing these fields results in the openshift-controller-manager again generating a new set of secrets.

  3. The rollback initiated by the helm-controller fail with errors for the PersistentVolumeClaim, since the manifests used for rollback do not have the auto-populated fields such as spec.volumeName and spec.storageClassName. These fields cannot be cleared after creation.

  4. The helm-controller clears any helm revision which does not have a successful deployment status to ensure that a helm upgrade failure is retried. This results in an upgrade and rollback with each reconciliation loop attempt until the upgrade succeeds (when the user corrects the Operand yaml). And each upgrade/rollback attempt results in a fresh pair oftoken and dockercfg secrets being created for every service account.

Steps to reproduce

Prerequisite: Access to an OpenShift cluster.

  1. Create the Sample Catalog using the catalogsource.yaml file.
    $ kubectl apply -f catalogsource.yaml
  2. Install the Sample Operator from the Operator Hub using the OpenShift cluster console. The Operator installation defaults to the sample namespace.
  3. Create the Sample Operand using the Operator and ensure the deployment is successful.
  4. Watch the secrets in the sample namespace:
    $ watch kubectl get secrets -n sample
    There should be only 2 sample-sa-* secrets.
  5. Watch the helm revision history:
    $ watch helm history sample -n sample 
    There should be only 1 revision listed, which has deployed status and Install completed as description
  6. Update the Operand from the YAML view. Reduce the pvc.size from the default 2Gi to 1Gi. This upgrade will fail since the persistent volume size cannot be reduced.

Outcome

Logs for a test run are recorded in the debug folder.

You will observe in the watch windows that the number of sample-sa-* secrets increase rapidly and eventually slows down with reconciliation backoff. The revision history switches between listing 1 revision and 3 revisions. The second revision is an upgrade failure and the third revision is a rollback failure.

Revision 1 is always with Install completed.

Error from Revision 2 for upgrade failure (pulled from the sh.helm.release.v1.sample.v2 secret):

Upgrade "sample" failed: cannot patch "sample-pvc" with kind PersistentVolumeClaim: PersistentVolumeClaim "sample-pvc" is invalid: spec.resources.requests.storage: Forbidden: field can not be less than previous value

Error from Revision 3 for the rollback failure (pulled from the sh.helm.release.v1.sample.v3 secret):

Rollback "sample" failed: failed to replace object: PersistentVolumeClaim "sample-pvc" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
  core.PersistentVolumeClaimSpec{
  	AccessModes:      {"ReadWriteOnce"},
  	Selector:         nil,
  	Resources:        {Requests: {s"storage": {i: {...}, s: "2Gi", Format: "BinarySI"}}},
- 	VolumeName:       "pvc-a5cf5a2c-ae5b-4e4a-bbe8-a25b75a4b9be",
+ 	VolumeName:       "",
- 	StorageClassName: &"gp3-csi",
+ 	StorageClassName: nil,
  	VolumeMode:       &"Filesystem",
  	DataSource:       nil,
  	DataSourceRef:    nil,
  }

About

Sample app to demo Red Hat helm operator bug

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published