Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeTaints not detected while using taintReplacementOptions to rotate FDB cluster pods #2091

Closed
kky-fury opened this issue Jul 1, 2024 · 4 comments
Labels
question Further information is requested

Comments

@kky-fury
Copy link
Contributor

kky-fury commented Jul 1, 2024

What happened?

We were experimenting with using taintReplacementOptions for rotating the pods of our FDB cluster onto new nodes, while upgrading our Kubernetes version. However, after applying the taints onto the nodes the taints were not detected by the operator.

The operator logs showed the following error:

level":"error","msg":"pkg/mod/k8s.io/client-go@v0.26.10/tools/cache/reflector.go:169: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User \"system:serviceaccount:infra:fdb-operator\" cannot list resource \"nodes\" in API group \"\" at the cluster scope\n"

What did you expect to happen?

The taints on the nodes to be detected and the fdb-operator to automatically delete and reschedule the coordinator, log, stateless, and storage pods onto new nodes.

How can we reproduce it (as minimally and precisely as possible)?

Tainting the nodes running the FDB cluster pods with something similar to below:

from kubernetes import client
client.V1Taint(
            key="foo/bar",
            value="fdbrotation",
            effect="PreferNoSchedule"
    )

Patching the FDB cluster spec with something like below:

 "spec": {
            "automationOptions": {
                "replacements": {
                    "taintReplacementOptions": [
                        {
                            "key": "foo/bar",
                            "durationInSeconds": 300
                        }
                    ],
                    "taintReplacementTimeSeconds": 60,
                    "enabled": True
                }
            }
        }

Anything else we need to know?

We added the required permissions to the RBAC role for the resources nodes and it fixed the issue.

Changes

We would like to merge to main if these changes are acceptable.

FDB Kubernetes operator

FDB-operator version: 1.33.0

Kubernetes version

K8s version: 1.27.12

Cloud provider

AWS, EKS

@kky-fury kky-fury added the bug Something isn't working label Jul 1, 2024
@johscheuer
Copy link
Member

Hello 👋

Could you please verify if you have a ClusterRole similar to this one: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/samples/deployment.yaml#L6-L19 for your operator deployment? The error that you copied says that the operator is not allowed to list nodes (and therefore cannot check the taints). If the ClusterRole exists, you have to make sure that there is a ClusterRoleBinding for your service account, similar to: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/config/samples/deployment.yaml#L141-L152

@johscheuer johscheuer added question Further information is requested and removed bug Something isn't working labels Jul 1, 2024
@kky-fury
Copy link
Contributor Author

kky-fury commented Jul 2, 2024

Hello,

Thank you for your reply. Yes, we did not have that before but added it to make it work.

Is there any plan to add it to the official helm chart?

@johscheuer
Copy link
Member

We don't maintain the helm-charts actively as they were contributed by the community (see: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/README.md#using-helm). If you have the time to add it to the helm-charts and open a PR, that would be appreciated :)

@kky-fury
Copy link
Contributor Author

kky-fury commented Jul 3, 2024

I created one, please take a look #2093.

johscheuer pushed a commit that referenced this issue Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants