Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(db-operator/rbac): Add missing list and watch for deployments for custom resource monitoring #33

Merged
merged 2 commits into from
Jan 19, 2024

Conversation

hcnp
Copy link

@hcnp hcnp commented Dec 29, 2023

The operator needs this to monitor the custom resources.

Without these permissions the operator will fail to monitor custom resources and thus fail to update them.

This was in the operator log before the change:

time="2023-12-28T15:31:39Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db start Reconciling"
time="2023-12-28T15:31:39Z" level=info msg="Secret Update Event detected: secret=tcs-test/tcs-test-db-credentials, database=tcs-test-db"
time="2023-12-28T15:31:39Z" level=info msg="Start processing Database Secret Update Event"
time="2023-12-28T15:31:39Z" level=info msg="processing Database Secret label: name=kinda.rocks/used-by-name, value=tcs-test-db"
time="2023-12-28T15:31:39Z" level=info msg="Secret Update Event detected: secret=tcs-test/tcs-test-db-credentials, database=tcs-test-db"
time="2023-12-28T15:31:39Z" level=info msg="Start processing Database Secret Update Event"
time="2023-12-28T15:31:39Z" level=info msg="processing Database Secret label: name=kinda.rocks/used-by-name, value=tcs-test-db"
2023/12/28 15:31:39 refreshing ephemeral certificate for instance goats-tcs:europe-west3:gsql-tcs-dev-01
2023/12/28 15:31:40 Generated RSA key in 68.591545ms
2023/12/28 15:31:40 Scheduling refresh of ephemeral certificate in 55m0s
time="2023-12-28T15:31:40Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db successfully created"
time="2023-12-28T15:31:40Z" level=info msg="Secret Update Event detected: secret=tcs-test/dbin-gsql-tcs-dev-01-access-secret, database=tcs-test-db"
time="2023-12-28T15:31:40Z" level=info msg="Start processing Database Secret Update Event"
time="2023-12-28T15:31:40Z" level=info msg="processing Database Secret label: name=kinda.rocks/used-by-name, value=tcs-test-db"
time="2023-12-28T15:31:40Z" level=info msg="Secret Update Event detected: secret=tcs-test/dbin-gsql-tcs-dev-01-access-secret, database=tcs-test-db"
time="2023-12-28T15:31:40Z" level=info msg="Start processing Database Secret Update Event"
time="2023-12-28T15:31:40Z" level=info msg="processing Database Secret label: name=kinda.rocks/used-by-name, value=tcs-test-db"
time="2023-12-28T15:31:40Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db instance access secret created"
W1228 15:31:40.914883       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:31:40.915104       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
W1228 15:31:42.125042       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:31:42.125331       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
W1228 15:31:45.177618       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:31:45.177660       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
W1228 15:31:48.639417       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:31:48.639468       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
W1228 15:31:56.560216       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:31:56.560242       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
time="2023-12-28T15:31:59Z" level=info msg="Instance: name=gsql-tcs-dev-01 Running"
W1228 15:32:19.414496       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:32:19.414546       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
time="2023-12-28T15:32:59Z" level=info msg="Instance: name=gsql-tcs-dev-01 Running"
W1228 15:33:01.626094       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:33:01.626135       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
W1228 15:33:37.784914       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:33:37.785214       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
time="2023-12-28T15:33:59Z" level=info msg="Instance: name=gsql-tcs-dev-01 Running"
time="2023-12-28T15:34:35Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db proxy created"
time="2023-12-28T15:34:35Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db database info configmap created"
time="2023-12-28T15:34:35Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db finish Ready"
time="2023-12-28T15:34:35Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db start Reconciling"
time="2023-12-28T15:34:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db successfully created"
time="2023-12-28T15:34:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db instance access secret created"
time="2023-12-28T15:34:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db proxy created"
time="2023-12-28T15:34:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db database info configmap created"
time="2023-12-28T15:34:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db finish Ready"
time="2023-12-28T15:34:59Z" level=info msg="Instance: name=gsql-tcs-dev-01 Running"
time="2023-12-28T15:35:35Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db start Reconciling"
time="2023-12-28T15:35:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db successfully created"
time="2023-12-28T15:35:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db instance access secret created"
time="2023-12-28T15:35:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db proxy created"
time="2023-12-28T15:35:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db database info configmap created"
time="2023-12-28T15:35:36Z" level=info msg="DB: namespace=tcs-test, name=tcs-test-db finish Ready"
time="2023-12-28T15:35:59Z" level=info msg="Instance: name=gsql-tcs-dev-01 Running"

@allanger
Copy link
Member

allanger commented Jan 2, 2024

I'm a bit confused. It's failing to list deployments, but I don't see when you need it. You're using google instances, not generic, right? @hcnp

@hcnp
Copy link
Author

hcnp commented Jan 2, 2024

Yes. It's on Google Cloud SQL. It's mainly these lines where I guess the error is "watch":

W1228 15:31:56.560216       1 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope
E1228 15:31:56.560242       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Failed to watch *v1.Deployment: failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:db-operator:db-operator" cannot list resource "deployments" in API group "apps" at the cluster scope

So I'm actually only changing this file: charts/db-operator/templates/controller/rbac.yaml

The other changes is to create a template for updating the chart readme.md file with this tool: https://github.com/norwoodj/helm-docs. I can move that to a seperate PR.

@allanger
Copy link
Member

allanger commented Jan 2, 2024

The other changes is to create a template for updating the chart readme.md file with this tool: https://github.com/norwoodj/helm-docs. I can move that to a seperate PR.

It's already implemented here: #16
But since it's a big PR, it's going to be reviewed a bit later. So I'd suggest to wait a bit for it

Copy link
Member

@allanger allanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, since I'm not using GSQL instances and I'm not actually the one who's been developing that part, it seems fine to me to add those permissions, though I don't understand what exactly introduced that bug and how tests are passing. I guess we want to re-work permissions later anyway.

I would ask you to drop readme related changes and leave only RBAC and Chart.yaml files updated.

I'd say that after this one is merged: #32, it makes sense to rebase and bump a patch version.

@allanger
Copy link
Member

@hcnp Woud you be able to apply those fixes and rebase to the main?

@hcnp
Copy link
Author

hcnp commented Jan 19, 2024

I've rebased. I've also tested this again. Without the "list" permission the status of the db resource won't get updated correctly in addition to the error and log lines above, but the dbuser and db gets created correctly together with the gcloud proxy:

apiVersion: kinda.rocks/v1beta1
kind: Database
metadata:
  finalizers:
  - db.db
  generation: 1
  labels:
    env: dev
    kustomize.toolkit.fluxcd.io/name: infrastructure
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: db
  namespace: tcs-test
spec:
  backup:
    cron: 0 0 * * *
    enable: false
  cleanup: true
  credentials:
    templates:
    - name: CONNECTION_STRING
      secret: true
      template: '{{ .Protocol }}://{{ .Username }}:{{ .Password }}@{{ .Hostname }}:{{
        .Port }}/{{ .Database }}'
  deletionProtected: false
  instance: gsql-tcs-dev-01
  postgres: {}
  secretName: db-credentials
status:
  database: tcs-test-db
  engine: postgres
  phase: ProxyCreating
  proxyStatus:
    serviceName: ""
    sqlPort: 0
    status: false
  status: false
  user: tcs-test-db

@hcnp hcnp requested a review from allanger January 19, 2024 13:51
@allanger
Copy link
Member

@hcnp We'll unfortunately have to wait for #43 and rebase again. But to get something merged to the charts repo, if you update charts, you always need to bump a new chart version. Since I see that there is only one file changed, I think you still need to do it. It seems like a fix to me, so I would go with a patch

@allanger
Copy link
Member

Please, rebase one more time :)

The operator needs this to monitor the custom resources.
@allanger
Copy link
Member

allanger commented Jan 19, 2024

there is a conflict, and current dbin version is 2.2.0: https://github.com/db-operator/charts/blob/main/charts/db-instances/Chart.yaml

We've just released a big change for a test pipieline a couple of days ago and dbinsntances were affected too

@hcnp hcnp force-pushed the fix/missing-rbac-1.16.1 branch 2 times, most recently from cb85c7b to e9f4915 Compare January 19, 2024 14:53
@hcnp
Copy link
Author

hcnp commented Jan 19, 2024

Please, rebase one more time :)

Done ;)

Copy link
Member

@allanger allanger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@allanger allanger merged commit 55d1b24 into db-operator:main Jan 19, 2024
10 checks passed
@hcnp hcnp deleted the fix/missing-rbac-1.16.1 branch January 19, 2024 15:58
@hcnp hcnp restored the fix/missing-rbac-1.16.1 branch January 19, 2024 16:09
@hcnp hcnp deleted the fix/missing-rbac-1.16.1 branch January 19, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants