Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immediate release uninstall - Helm Operator #6244

Closed
dannyt-cx opened this issue Jan 10, 2023 · 8 comments
Closed

Immediate release uninstall - Helm Operator #6244

dannyt-cx opened this issue Jan 10, 2023 · 8 comments
Assignees
Labels
language/helm Issue is related to a Helm operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/support Indicates an issue that is a support question.
Milestone

Comments

@dannyt-cx
Copy link

Type of question

Open question

Question

What did you do?

I have a Helm Based operator with base 1.25.2 and I have an EKS Cluster 1.24.
I have 2 scenarios:

The Jenkins Scenario:

I install my helm based operator just fine and it holds 3 CRs in total (Im aware its against the documentation practice).
I then pass a helm upgrade --install my-release --set rabbitmq.enabled=false -f custom-values.yaml

My CR is called Platform - which basically is just a few known charts that i use to create global secrets and configmaps for other services to use. (.i.e Minio address, RabbitMQ connection string etc.)

what happens next puzzles me greatly - i see in operator logs that it beings to watch: "logger":"helm.controller","msg":"Watching dependent resource"
Those are the helm objects it detects and watches, also prints that its owned by the CR i created.

I get this message as well:
{"level":"info","ts":1673373443.4492545,"logger":"helm.controller","msg":"Upgraded release","namespace":"ndp","name":"ndp-platform","apiVersion":"ndp.com/v1","kind":"Platform","release":"ndp-platform","force":false}

Immediately after that i get these messages and no pods are created.
{"level":"error","ts":1673373443.870208,"msg":"Reconciler error","controller":"platform-controller","object":{"name":"ndp-platform","namespace":"ndp"},"namespace":"ndp","name":"ndp-platform","reconcileID":"8f2cafe7-33a6-446f-af61-e051d139df3f","error":"Operation cannot be fulfilled on platforms.ndp.com \"ndp-platform\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:234"}

{"level":"info","ts":1673373444.9783742,"logger":"helm.controller","msg":"Uninstalled release","namespace":"ndp","name":"ndp-platform","apiVersion":"ndp.com/v1","kind":"Platform","release":"ndp-platform"} {"level":"info","ts":1673373445.014033,"logger":"helm.controller","msg":"Removing finalizer","namespace":"ndp","name":"ndp-platform","apiVersion":"ndp.com/v1","kind":"Platform","release":"ndp-platform"}

Now this scenario happens ONLY when running an install from Jenkins.

The Manual Scenario

I took the values I generate through Jenkins and manually apply from MY CLI and my Laptop
But this time - I see all pods I need and all the configmaps and other objects.
Here's where it get funny.
If i run the helm uninstall my-release which should remove all those pods and configmaps.
they are deleted but then I see in the Operator logs - that it installs it back again!

Without removing the operator pod, there's no way to remove the objects.

What did you expect to see?

  1. Generated resources are persisting and not removed.
  2. The chart that utilizes the CR is successfully removed.

What did you see instead? Under which circumstances?

  1. No resources generated.
  2. Unable to remove resources.

Environment

Operator type:

/language helm

Kubernetes cluster type:
EKS 1.24

$ operator-sdk version

$ go version (if language is Go)

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:28:30Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/amd64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.8-eks-ffeb93d", GitCommit:"abb98ec0631dfe573ec5eae40dc48fd8f2017424", GitTreeState:"clean", BuildDate:"2022-11-29T18:45:03Z", GoVersion:"go1.18.8", Compiler:"gc", Platform:"linux/amd64"}

Additional context

Thank you kindly.

@openshift-ci openshift-ci bot added the language/helm Issue is related to a Helm operator project label Jan 10, 2023
@jberkhahn jberkhahn self-assigned this Jan 23, 2023
@jberkhahn jberkhahn added this to the Backlog milestone Jan 23, 2023
@jberkhahn
Copy link
Contributor

So, it seems like you're trying to run helm commands manually against releases that have been created by your helm operator. That's not ever going to work because without any updates to the Specs of the CR, the helm operator controller will see that as divergent state and re-reconcile it back to the way it was.

Why are you trying to interact with the releases directly rather than by modifying the CR? What's your specific use case?
Also, could you post your controller pod's logs from when you attempt these commands?

@jberkhahn jberkhahn added the triage/support Indicates an issue that is a support question. label Jan 23, 2023
@dannyt-cx
Copy link
Author

@jberkhahn Thank you for the reply!

  1. The thing is - im not running any command manually. I have in my operator - 3 CRs - Platform | MicroService | MicroFrontend. the Platform installs some 3rd party charts - DBs,Redis and others and also contains 2 CRs of the MicroService type which basically enables me to have the entire backend services i need for any Application related Microservice CRs to run. - This does contain some actions of creating resources - like Jobs and Service modification as part of those 3rd party charts.

Maybe thats why it appears to be "manual". However this happens solely when using a CD platform like Jenkins.

  1. On the 2nd scenario where I install the operator and pass my CR with its values - this uninstall doesnt happen.
  2. My actual use case is that I install The Platform CR and a subsequent Helm chart which contains my Microservices CRs (Basically a Chart of CRs) [I have many such charts due to the nature of the application.]

The logs that i put in my original message are direct outputs from the Controller pod. thats all it prints out.

Thanks again!

@jberkhahn
Copy link
Contributor

The thing is - im not running any command manually.

I'm not sure I follow, then. What is that helm upgrade command coming from then? Is Jenkins running it? If not, what is Jenkins doing? Creating instances of your Platform CR?

On the 2nd scenario where I install the operator and pass my CR with its values - this uninstall doesnt happen.

So are you creating or deleting a CR? You say you create a CR but are expecting resources to be deleted? What?

@dannyt-cx
Copy link
Author

dannyt-cx commented Jan 28, 2023

Within the Jenkins Scenario - jenkins is running the helm upgrade --install command.
Jenkins runs that command. The chart that it installs contains my Platform CR.

I am creating a CR.
and when i remove the Chart that contains it - helm uninstall the CRs are not removed at all.
instead i see within the operator logs that it says "reconciled release\upgrade release"

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 29, 2023
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2023

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this as completed Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/helm Issue is related to a Helm operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

3 participants