Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeGroupForNode() gets incorrect *apiv1.Node information #6886

Open
adyanth opened this issue Jun 4, 2024 · 2 comments
Open

NodeGroupForNode() gets incorrect *apiv1.Node information #6886

adyanth opened this issue Jun 4, 2024 · 2 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@adyanth
Copy link

adyanth commented Jun 4, 2024

Which component are you using?: cluster-autoscaler

What version of the component are you using?: master as of 6/1/23 e08681b

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5+k3s1

What environment is this in?: baremetal on ProxmoxVE, K3s

What did you expect to happen?: Expected the NodeGroupForNode to receive the right node information

What happened instead?: Incorrect node.Spec.ProviderId and missing node labels

How to reproduce it (as minimally and precisely as possible):
I am building a custom cluster autoscaler. There are existing nodes that belong to the node group. When starting the cluster autoscaler, it checks for node groups of all nodes. Log the node information in the call to NodeGroupForNode.

Anything else we need to know?: If you see the logs below, the function is called and logged multiple times on start up, most of them have the right spec.ProviderId. But some of them do not, and thus the autoscaler logs that the node does not belong to the node pool. k8s-worker-ca-1 and k8s-worker-ca-2 are part of the Autoscaling node group, and should have providerIds set to proxmox://Autoscaling/600/1 and proxmox://Autoscaling/600/2 respectively. kratos-master is the single node master.

❯ ./run.sh
rm -f cluster-autoscaler-arm64
CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build -o cluster-autoscaler-arm64 --ldflags "-s" 
I0603 21:49:56.514488    9892 leaderelection.go:250] attempting to acquire leader lease kube-system/cluster-autoscaler...
I0603 21:49:56.526225    9892 leaderelection.go:260] successfully acquired lease kube-system/cluster-autoscaler
I0603 21:49:56.547589    9892 log.go:245] Getting first node
I0603 21:49:56.580962    9892 log.go:245] Getting node object for kratos-proxmox
I0603 21:49:56.587036    9892 log.go:245] Geting reference container object for id 600
I0603 21:49:56.602574    9892 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0603 21:49:56.607812    9892 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 5.21925ms
I0603 21:49:57.729440    9892 request.go:697] Waited for 1.126855416s due to client-side throttling, not priority and fairness, request: GET:https://kratos-master.k8s.adyanth.lan:6443/apis/policy/v1/poddisruptionbudgets?limit=500&resourceVersion=0


I0603 21:50:06.603981    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.604081    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.604098    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master
I0603 21:50:06.613016    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.614106    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.614125    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master

W0603 21:50:06.614333    9892 clusterstate.go:477] AcceptableRanges have not been populated yet. Skip checking

I0603 21:50:06.665045    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.665099    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.665104    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master

## Incorrect node info here!!!
I0603 21:50:06.665108    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: k8s-worker-ca-1
W0603 21:50:06.665119    9892 clusterstate.go:648] Nodegroup is nil for k8s-worker-ca-1
I0603 21:50:06.665125    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: k8s-worker-ca-2
W0603 21:50:06.665128    9892 clusterstate.go:648] Nodegroup is nil for k8s-worker-ca-2

## Incorrect node info here!!!
I0603 21:50:06.665336    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: k8s-worker-ca-1
W0603 21:50:06.665413    9892 static_autoscaler.go:760] No node group for node k8s-worker-ca-1, skipping
I0603 21:50:06.665418    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: k8s-worker-ca-2
W0603 21:50:06.665421    9892 static_autoscaler.go:760] No node group for node k8s-worker-ca-2, skipping

I0603 21:50:06.666055    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.666065    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.666070    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master
I0603 21:50:06.666179    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.666186    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.666189    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master
I0603 21:50:06.666237    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.666266    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.666369    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.666374    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.666378    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master
I0603 21:50:06.666415    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.666418    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
I0603 21:50:06.666421    9892 log.go:245] Getting nodegroup for nodekratos-master with spec.providerId: k3s://kratos-master
I0603 21:50:06.677276    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-1 with spec.providerId: proxmox://Autoscaling/600/1
I0603 21:50:06.677291    9892 log.go:245] Getting nodegroup for nodek8s-worker-ca-2 with spec.providerId: proxmox://Autoscaling/600/2
@adyanth adyanth added the kind/bug Categorizes issue or PR as related to a bug. label Jun 4, 2024
@adrianmoisey
Copy link
Contributor

/area cluster-autoscaler

@adyanth
Copy link
Author

adyanth commented Jun 12, 2024

Any update on this? It is blocking me from implementing a custom cluster autoscaler for proxmox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants