Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] node lifecycle controller in yurt-manager can not update status of node #1934

Closed
crazytaxii opened this issue Jan 26, 2024 · 5 comments · Fixed by #1936
Closed

[BUG] node lifecycle controller in yurt-manager can not update status of node #1934

crazytaxii opened this issue Jan 26, 2024 · 5 comments · Fixed by #1936
Labels
kind/bug kind/bug

Comments

@crazytaxii
Copy link
Contributor

crazytaxii commented Jan 26, 2024

What happened:
Node always stays with Ready status after stopping kubelet on it, even shutting down the node itself.
The bug causes the Pods can not be migrated to other nodes.

What you expected to happen:
The abnormal node should be updated into NotReady status.

How to reproduce it (as minimally and precisely as possible):
Stopping the kubelet on a node.

Anything else we need to know?:
Error log in yurt-manager's node lifecycle controller:

E0126 07:43:15.444074       1 node_lifecycle_controller.go:975] "Error updating node" err="nodes \"edge\" is forbidden: User \"system:serviceaccount:kube-system:yurt-manager\" cannot update resource \"nodes/status\" in API group \"\" at the cluster scope" node="edge"
E0126 07:43:15.452574       1 node_lifecycle_controller.go:715] "Update health of Node from Controller error, Skipping - no pods will be evicted" err="timed out waiting for the condition" node="edge"

nodes/status is a subresource, it should be added to the ClusterRole of yurt-manager also.

Environment:

  • OpenYurt version: v1.4
  • Kubernetes version (use kubectl version): v1.27.2

/kind bug

@crazytaxii crazytaxii added the kind/bug kind/bug label Jan 26, 2024
@rambohe-ch
Copy link
Member

@crazytaxii Thanks for raising issue.
It seems that rbac settings of nodelifecycle had been missed. would you like to make a pull request to fix it?

@crazytaxii
Copy link
Contributor Author

/assign @crazytaxii

@crazytaxii
Copy link
Contributor Author

It has been fixed in #1884.

@crazytaxii
Copy link
Contributor Author

The entire system:controller:node-controller ClusterRole for kube-controller-manager in Kubernetes cluster v1.27.2 is:

# ...
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - delete
  - get
  - list
  - patch
  - update
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
  - update
- apiGroups:
  - ""
  resources:
  - pods/status
  verbs:
  - patch
  - update
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - delete
  - list
- apiGroups:
  - networking.k8s.io
  resources:
  - clustercidrs
  verbs:
  - create
  - get
  - list
  - update
- apiGroups:
  - ""
  - events.k8s.io
  resources:
  - events
  verbs:
  - create
  - patch
  - update
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get

Compare to the ClusterRole of yurt-manager(v1.4):

# ...
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
#  - delete # missing one
  - get
  - list
  - patch
  - update
  - watch # extra one
# - apiGroups: # missing one
#  - ""
#  resources:
#  - nodes/status
#  verbs:
#  - patch
#  - update
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - create # extra one
  - delete
  - get
  - list
  - patch # extra one
  - update # extra one
  - watch # extra one
- apiGroups:
  - ""
  resources:
  - pods/status
  verbs:
#  - patch # missing one
  - update
# - apiGroups: # missing one
#  - networking.k8s.io
#  resources:
#  - clustercidrs
#  verbs:
#  - create
#  - get
#  - list
#  - update
# - apiGroups: # missing one
#  - ""
#  - events.k8s.io
#  resources:
#  - events
#  verbs:
#  - create
#  - patch
#  - update
# ...

But the node lifecycle controller in yurt-manager differs a lot from the one in kube-controller-manager v1.27.2 definitely.

@rambohe-ch
Copy link
Member

clustercidrs

@crazytaxii Except networking.k8s.io/clustercidrs resource, other missed rbac settings should be added to yurt-manager.
because networking.k8s.io/clustercidrs is used by node ipam controller in kube-controller-manager, and it is not needed by nodelifecycle controller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug kind/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants