Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My helm operator hang after I upgraded helm-operator from v1.33.0 to v1.34.0 #6690

Closed
Tracked by #6767
kschanrtp opened this issue Mar 4, 2024 · 29 comments · Fixed by #6769
Closed
Tracked by #6767

My helm operator hang after I upgraded helm-operator from v1.33.0 to v1.34.0 #6690

kschanrtp opened this issue Mar 4, 2024 · 29 comments · Fixed by #6769
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. language/helm Issue is related to a Helm operator project
Milestone

Comments

@kschanrtp
Copy link

Bug Report

What did you do?

I upgraded the helm-operator version from v1.33.0 to v1.34.0

What did you expect to see?

My helm operator deploy helm chart successfully

What did you see instead? Under which circumstances?

My helm operator hang doing new install.

I did notice there is great jump of version for helm-operator-plugins. Not sure if this related or not

- github.com/operator-framework/helm-operator-plugins v0.0.12-0.20231013185714-215d1f8a3e7d
+ github.com/operator-framework/helm-operator-plugins v0.1.3  

I have anonymized the log output below.

Working helm operator log running v1.33.0

{"level":"info","ts":"2024-02-29T19:16:11Z","logger":"cmd","msg":"Version","Go Version":"go1.21.5","GOOS":"linux","GOARCH":"amd64","helm-operator":"v1.33.0","commit":"542966812906456a8d67cf7284fc6410b104e118"}
...
{"level":"info","ts":"2024-02-29T19:17:01Z","msg":"Starting EventSource","controller":"myhelm-controller","source":"kind source: *unstructured.Unstructured"}
{"level":"info","ts":"2024-02-29T19:17:01Z","msg":"Starting Controller","controller":"myhelm-controller"}
{"level":"info","ts":"2024-02-29T19:17:01Z","msg":"Starting workers","controller":"myhelm-controller","worker count":16}
{"level":"info","ts":"2024-02-29T19:17:12Z","msg":"Starting EventSource","controller":"myhelm-controller","source":"kind source: *unstructured.Unstructured"}
{"level":"info","ts":"2024-02-29T19:17:12Z","logger":"helm.controller","msg":"Watching dependent resource","ownerApiVersion":"my.example.com/v1alpha1","ownerKind":"myKind","apiVersion":"v1","kind":"Service"}
...
myhelm chart is deployed

helm operator log running v1.34.0

{"level":"info","ts":"2024-03-04T20:12:52Z","logger":"cmd","msg":"Version","Go Version":"go1.21.7","GOOS":"linux","GOARCH":"amd64","helm-operator":"v1.34.0","commit":"4e01bcd726aa8b0e092fcd3ab874961e276f3db3"}
...
{"level":"info","ts":"2024-03-04T20:13:43Z","msg":"Starting EventSource","controller":"myhelm-controller","source":"kind source: *unstructured.Unstructured"}
{"level":"info","ts":"2024-03-04T20:13:43Z","msg":"Starting Controller","controller":"myhelm-controller"}
{"level":"info","ts":"2024-03-04T20:13:44Z","msg":"Starting workers","controller":"myhelm-controller","worker count":16}
NO More Output

Environment

Operator type:

Kubernetes cluster type:

$ operator-sdk version
operator-sdk-v1.12.0+git

$ go version (if language is Go)
go: 1.21.1

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.10+28ed2d7", GitCommit:"c725f2ce5164bf4165b22d6c28dd0ace4b3b7e9b", GitTreeState:"clean", BuildDate:"2024-01-23T03:16:21Z", GoVersion:"go1.20.12 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

Additional context

@acornett21
Copy link
Contributor

@acornett21
Copy link
Contributor

@kschanrtp 1.34.0's release did not complete fully. Can you try updating to 1.34.1 to see if this resolves your issue?

@kschanrtp
Copy link
Author

@acornett21 Same problem with 1.34.0

Same problem
{"level":"info","ts":"2024-03-05T20:04:17Z","logger":"cmd","msg":"Version","Go Version":"go1.21.7","GOOS":"linux","GOARCH":"amd64","helm-operator":"v1.34.1","commit":"edaed1e5057db0349568e0b02df3743051b54e68"}
...
{"level":"info","ts":"2024-03-05T20:05:06Z","msg":"Starting EventSource","controller":"myhelm-controller","source":"kind source: *unstructured.Unstructured"}
{"level":"info","ts":"2024-03-05T20:05:06Z","msg":"Starting Controller","controller":"myhelm-controller"}
{"level":"info","ts":"2024-03-05T20:05:06Z","msg":"Starting workers","controller":"myhelm-controller","worker count":16}
NO MORE OUTPUT

@sudhir-kelkar
Copy link

@acornett21
Any update on this?
We need this fix to get rid of security vulnerability GHSA-r53h-jv2g-vpx6

@acornett21
Copy link
Contributor

@sudhir-kelkar I have not looked at this, I was just relating all the issues that came in, and asking if this still existed in 1.34.1, since 1.34.0 release was incomplete. I personally will not have time to look at this for a few weeks, I'm only a contributor to this project, not a dedicated maintainer.

@jberkhahn
Copy link
Contributor

Could you please share the structure of your CR? The most likely reason something like this happens is if your RBAC is incorrect and the controller doesn't have permissions to see all the resources it needs to. Could you please post the output of your subscription (if you're using OLM).

@jberkhahn jberkhahn added the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 8, 2024
@jberkhahn jberkhahn added this to the Backlog milestone Apr 8, 2024
@jberkhahn
Copy link
Contributor

relates #6651

@malli31
Copy link

malli31 commented Apr 21, 2024

Any udpate on this, even after using 1.34.1 not able to see any pods after cr deployment,
Switching back to 1.33.0 is perfectly working

@malli31
Copy link

malli31 commented Apr 22, 2024

@acornett21 any inputs why 1.34.1 is not working? bumping back to 1.33.0 is perfectly working perfectly fine.
Any work around suggested ?? Not seeing any logs or any events or anything yet all

@jberkhahn
Copy link
Contributor

Something broke when we cut 1.34. We're not sure what exactly but are currently investigating.

@kmcdon83
Copy link

+1 for this issue, moving from 1.33 to 1.34.1 has stopped any process of reconciliation

@kmcdon83
Copy link

+1 for this issue, moving from 1.33 to 1.34.1 has stopped any process of reconciliation

I have verified the 1.34.2 has resolved my issue.

@kschanrtp
Copy link
Author

kschanrtp commented May 16, 2024

I am still having problem with 1.34.2. Same problem. It does not do the reconcilation.

{"level":"info","ts":"2024-05-16T16:43:04Z","logger":"cmd","msg":"Version","Go Version":"go1.21.10","GOOS":"linux","GOARCH":"amd64","helm-operator":"v1.34.2","commit":"81dd3cb24b8744de03d312c1ba23bfc617044005"}
...
{"level":"info","ts":"2024-05-16T16:43:55Z","msg":"Starting EventSource","controller":"manageservice-controller","source":"kind source: *unstructured.Unstructured"}
{"level":"info","ts":"2024-05-16T16:43:55Z","msg":"Starting Controller","controller":"manageservice-controller"}
{"level":"info","ts":"2024-05-16T16:43:55Z","msg":"Starting workers","controller":"manageservice-controller","worker count":16}

@kmcdon83
Copy link

+1 for this issue, moving from 1.33 to 1.34.1 has stopped any process of reconciliation

I have verified the 1.34.2 has resolved my issue.

Sorry, I spoke too soon. It does appear there is no reconciliation occurring.

@kschanrtp
Copy link
Author

@jberkhahn Is it possible to create 1.33.1 based on 1.33.0 but compile with latest ubi 8 image to pick up security fixes in the ubi 8 image?

@acornett21
Copy link
Contributor

Hi @kschanrtp You're in control of your operator controller image and it's updates, if you want/need to update you can update the Dockerfile in your own operator project to do so. Something like:

USER root

RUN microdnf update && microdnf clean all

# Switch back to whatever user your container uses at runtime.

Or if you only want to update the libraries with CVE's you can do those individually.

@kschanrtp
Copy link
Author

kschanrtp commented May 21, 2024

@acornett21 I thought I have done that and it did not work. I will try again. May be my order of the update is not correct.

@kschanrtp
Copy link
Author

The CVEs are on the go module side of the helm-operator.

@acornett21 acornett21 added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. language/helm Issue is related to a Helm operator project and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Jun 5, 2024
@kschanrtp
Copy link
Author

kschanrtp commented Jun 13, 2024

@joelanford Any idea when we will have the fix,

@kschanrtp
Copy link
Author

Still not working using helm-operator 1.35.0.
@joelanford Any idea which release of helm-operator will have the fix?

@acornett21
Copy link
Contributor

@kschanrtp What process are you following to update? Are you just updating the controller image? or are you also updating the version that is used for scaffolding/bundling?

@kschanrtp
Copy link
Author

@acornett21 I just update the controller image

FROM quay.io/operator-framework/helm-operator:v1.35.0

@acornett21
Copy link
Contributor

You probably need to update the to version 1.35.0 for scaffolding/bundling as well. Can you try that as well?

@bluzarraga
Copy link

We're seeing the same problem with 1.35.0. We also use the FROM line to update the helm operator in the dockerfile and it does not seem to be working after building "successfully". Was something changed with 1.35.0 that requires more configuration during build than just updating the value in the FROM?

@acornett21
Copy link
Contributor

Like mentioned, in the previous message, you probably need to re-scaffold the project...see the release notes below

https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.35.0/

@malli31
Copy link

malli31 commented Jun 25, 2024

@acornett21 didnt get this re-scaffold part, ususally we are using FROM and getting the version,
Do we need any extra rbac for this to work?

@acornett21
Copy link
Contributor

@malli31 There is an operator-sdk binary that was used to create your operator project. I'm talking about making sure the binary to scaffold the project, and bundle it, matches the FROM version that is being used for the container image. Having different versions of the binary used to scaffold/bundle and of the runtime used, is likely the issue. As mentioned in the release notes previous versions did not scaffold helm projects correctly, so to get everything in sync, it's best to re-scaffold the project with the 1.35.0 binary. I hope this is clearer.

@malli31
Copy link

malli31 commented Jun 27, 2024

@acornett21 i get how to scaffold , but from documentation I see

Backwards Compatibility when Upgrading Operator-sdk version
When upgrading your version of Operator-sdk, it is intended that post-1.0.0 minor versions (i.e. 1.y) are backwards compatible and strictly additive. Therefore, you only need to re-scaffold your operator with a newer version of Operator-SDK if you wish to take advantage of new features. If you do not wish to use new features, all that should be required is bumping the operator image dependency (if a Helm or Ansible operator) and rebuilding your operator image.

Currently we dont want any new features, from documention it says jus by upgrading/bumping up helm operator version should work OOTB, but its not working,

The problem we dont want to upgrade is

same helm charts we use for operator,helm,yaml generation, adding all these extra new things will complicate our build and deployments,

Can you suggest if by taking new rbac and few folders from config [scaffold that are generated] can i deploy my operator, is this recomennded?

Also we dont want to complicate our deployment, till now we dont need operator-sdk to deploy our helm operator

Simply below two commands are sufficient for installation
kubectl create -f setup -n ns
kubectl crete -f operator_cr.yaml -n ns

@acornett21
Copy link
Contributor

There were many versions of operator-sdk that were broken for helm, so if you want it to work, it would be best to re-scaffold, the backwards compatibility is out the window if there is a bug. This isn't a new feature, it's a bug fix.

My recommendation is to rescaffold if you want to use the latest image at runtime, this is the only thing that has been tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. language/helm Issue is related to a Helm operator project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants