-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix and enhancement during my first try, hope it helps #306
Conversation
yurt-controller-manager failed to be scheduled to cloud node on k8s 1.18, with the following message: FailedScheduling ... node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. and if there are multiple nodes in the cluster, it will always assigned to a edge node, which is not as excepted fix it to assign the correct "key" and "effect" for the pod tolerations
set default image pull policy as IfNotPresent to ease local development otherwise k8s will always pull the image from docker registery, which means you need to setup a private registry and push the image every time you made any changes, and provide a long parameter to yurtctl or build new yaml files to deploy, it is time-consuming and error-prone currently we use "latest" tag for each images, it is not for production, so the change will not introduce new security concern
a cluster without cloud node will not work, so fail early
* build all the container images * create output folder: openyurt-release/ * tag the images and save to .tar.gz files * generate script for loading these images * save yurtctl as well you can package the output folder and deliver, to deploy it, just extract the package and cd to it, then "./load && yurtctl -h"
/kind enhancement |
/assign @rambohe-ch |
/assign @Peeknut |
- operator: "Exists" | ||
- key: "node-role.kubernetes.io/master" | ||
effect: "" | ||
operator: "Exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original operator: "Exists"
can make yurt-controller-manager
tolerates all of taints. would you comment the reason of adding key: "node-role.kubernetes.io/master"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without explicit blank assignment to effect
, it can't schedule to master node as I verified, and maybe the original intention is to tolerate any taints, in that case we can assign key as "", but in my opinion strict and explicit way is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my cluster, use operator:"Exists"
yurtctl-controller-manager
could schedule to master node.
[root@n80 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 124d v1.16.0
n80 Ready <none> 124d v1.16.0
[root@n80 ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-58cc8c89f4-c2z88 1/1 Running 2 84d
kube-system coredns-58cc8c89f4-m6v2b 1/1 Running 2 84d
kube-system etcd-master 1/1 Running 2 124d
kube-system kube-apiserver-master 1/1 Running 2 124d
kube-system kube-controller-manager-master 1/1 Running 3 124d
kube-system kube-flannel-ds-79ckt 1/1 Running 2 124d
kube-system kube-flannel-ds-q886f 1/1 Running 0 3d9h
kube-system kube-proxy-44cfx 1/1 Running 0 3d9h
kube-system kube-proxy-rk49h 1/1 Running 2 124d
kube-system kube-scheduler-master 1/1 Running 2 124d
kube-system yurt-controller-manager-5b67549d9b-k6lhb 1/1 Running 0 8s
kube-system yurt-hub-n80 1/1 Running 0 5s
kube-system yurt-tunnel-server-d84666f6c-nvrb8 1/1 Running 0 7s
kube-system yurtctl-servant-convert-n80-4r6vg 1/1 Running 0 7s
[root@n80 ~]# kubectl describe pod -n kube-system yurt-controller-manager-5b67549d9b-k6lhb
Name: yurt-controller-manager-5b67549d9b-k6lhb
Namespace: kube-system
Priority: 0
Node: master/10.10.102.78
Start Time: Tue, 25 May 2021 20:19:43 +0800
Labels: app=yurt-controller-manager
pod-template-hash=5b67549d9b
Annotations: <none>
Status: Running
IP: 10.10.102.78
IPs:
IP: 10.10.102.78
Controlled By: ReplicaSet/yurt-controller-manager-5b67549d9b
Containers:
yurt-controller-manager:
Container ID: docker://5a0c2e724d5998d73cb077468351d250252e2432851e0971a80e0d82b87c62a6
Image: registry.cn-hangzhou.aliyuncs.com/openyurttest/yurt-controller-manager:v0.4.0-amd64
Image ID: docker-pullable://registry.cn-hangzhou.aliyuncs.com/openyurttest/yurt-controller-manager@sha256:c13082cb9171b82a698ccc96dd710235f14227e47bec5a050c6258a8e269f80b
Port: <none>
Host Port: <none>
Command:
yurt-controller-manager
State: Running
Started: Tue, 25 May 2021 20:19:45 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from yurt-controller-manager-token-2ntpx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
yurt-controller-manager-token-2ntpx:
Type: Secret (a volume populated by a Secret)
SecretName: yurt-controller-manager-token-2ntpx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/yurt-controller-manager-5b67549d9b-k6lhb to master
Normal Pulled 87s kubelet, master Container image "registry.cn-hangzhou.aliyuncs.com/openyurttest/yurt-controller-manager:v0.4.0-amd64" already present on machine
Normal Created 87s kubelet, master Created container yurt-controller-manager
Normal Started 87s kubelet, master Started container yurt-controller-manager
[root@n80 ~]# kubectl describe node master
Name: master
Roles: master
Labels: alibabacloud.com/is-edge-worker=false
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=master
kubernetes.io/os=linux
node-role.kubernetes.io/master=
openyurt.io/is-edge-worker=false
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"96:e7:51:e3:0e:59"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.10.102.78
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 21 Jan 2021 12:34:52 +0800
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 07 May 2021 16:33:05 +0800 Fri, 07 May 2021 16:33:05 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Tue, 25 May 2021 20:20:06 +0800 Fri, 22 Jan 2021 22:36:24 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 25 May 2021 20:20:06 +0800 Tue, 09 Mar 2021 20:34:17 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 25 May 2021 20:20:06 +0800 Fri, 22 Jan 2021 22:36:24 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 25 May 2021 20:20:06 +0800 Tue, 25 May 2021 18:45:23 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.10.102.78
Hostname: master
Capacity:
cpu: 2
ephemeral-storage: 17394Mi
hugepages-2Mi: 0
memory: 3882072Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 16415037823
hugepages-2Mi: 0
memory: 3779672Ki
pods: 110
System Info:
Machine ID: a60cd88e65b74f27920dc57fc8b1f9da
System UUID: 4237B2C9-A1A7-1A01-8A0A-6B468BE3B652
Boot ID: df0415ce-5ff6-4f1f-a5b8-27df5135fa62
Kernel Version: 3.10.0-693.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.8
Kubelet Version: v1.16.0
Kube-Proxy Version: v1.16.0
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-58cc8c89f4-c2z88 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%) 84d
kube-system coredns-58cc8c89f4-m6v2b 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%) 84d
kube-system etcd-master 0 (0%) 0 (0%) 0 (0%) 0 (0%) 124d
kube-system kube-apiserver-master 250m (12%) 0 (0%) 0 (0%) 0 (0%) 124d
kube-system kube-controller-manager-master 200m (10%) 0 (0%) 0 (0%) 0 (0%) 124d
kube-system kube-flannel-ds-79ckt 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 124d
kube-system kube-proxy-rk49h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 124d
kube-system kube-scheduler-master 100m (5%) 0 (0%) 0 (0%) 0 (0%) 124d
kube-system yurt-controller-manager-5b67549d9b-k6lhb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2m15s
kube-system yurt-tunnel-server-d84666f6c-nvrb8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2m14s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 850m (42%) 100m (5%)
memory 190Mi (5%) 390Mi (10%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is my previous logs, after apply the change, issue fixed
box@joez-work-op-vm-2:~/share/repo/openyurt/openyurt-images$ kubectl get po -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-7ff77c879f-fsgjp 1/1 Running 0 20m 10.244.0.2 joez-work-op-vm-2 <none> <none>
kube-system coredns-7ff77c879f-t54gh 1/1 Running 0 20m 10.244.0.3 joez-work-op-vm-2 <none> <none>
kube-system etcd-joez-work-op-vm-2 1/1 Running 0 20m 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-apiserver-joez-work-op-vm-2 1/1 Running 0 20m 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-controller-manager-joez-work-op-vm-2 1/1 Running 0 20m 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-flannel-ds-d9r8c 1/1 Running 0 20m 10.67.103.191 joez-work-op-vm-3 <none> <none>
kube-system kube-flannel-ds-kzpkp 1/1 Running 0 20m 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-proxy-v2r5z 1/1 Running 0 20m 10.67.103.191 joez-work-op-vm-3 <none> <none>
kube-system kube-proxy-zqnbl 1/1 Running 0 20m 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-scheduler-joez-work-op-vm-2 1/1 Running 0 20m 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system yurt-controller-manager-5d4b5ffb89-q2tbl 1/1 Running 0 4m5s 10.67.103.191 joez-work-op-vm-3 <none> <none>
kube-system yurt-hub-joez-work-op-vm-3 1/1 Running 0 4m5s 10.67.103.191 joez-work-op-vm-3 <none> <none>
kube-system yurtctl-servant-convert-joez-work-op-vm-3-4rhfs 1/1 Running 0 4m5s 10.67.103.191 joez-work-op-vm-3 <none> <none>
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
but with the latest code, there is no such issue
box@joez-work-op-vm-2:~/share/repo/openyurt/openyurt-release$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
joez-work-op-vm-2 Ready master 2m6s v1.18.0 10.67.103.75 <none> Ubuntu 18.04.5 LTS 4.15.0-143-generic docker://20.10.2
joez-work-op-vm-3 Ready <none> 95s v1.18.0 10.67.103.191 <none> Ubuntu 18.04.5 LTS 4.15.0-112-generic docker://20.10.2
box@joez-work-op-vm-2:~/share/repo/openyurt/openyurt-release$ ./yurtctl convert -c joez-work-op-vm-2 -p kubeadm
I0526 10:48:50.203562 17897 convert.go:273] mark joez-work-op-vm-2 as the cloud-node
I0526 10:48:50.230939 17897 convert.go:466] kube-public/cluster-info configmap already exists, skip to prepare it
I0526 10:48:50.230959 17897 convert.go:353] deploying the yurt-hub and resetting the kubelet service...
I0526 10:49:20.256159 17897 util.go:320] servant job(yurtctl-servant-convert-joez-work-op-vm-3) has succeeded
I0526 10:49:20.256194 17897 convert.go:377] the yurt-hub is deployed
box@joez-work-op-vm-2:~/share/repo/openyurt/openyurt-release$ kubectl get po -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-7ff77c879f-7qm58 1/1 Running 0 7m28s 10.244.0.3 joez-work-op-vm-2 <none> <none>
kube-system coredns-7ff77c879f-nd6t9 1/1 Running 0 7m28s 10.244.0.2 joez-work-op-vm-2 <none> <none>
kube-system etcd-joez-work-op-vm-2 1/1 Running 0 7m37s 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-apiserver-joez-work-op-vm-2 1/1 Running 0 7m37s 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-controller-manager-joez-work-op-vm-2 1/1 Running 0 7m37s 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-proxy-dfrcj 1/1 Running 0 7m15s 10.67.103.191 joez-work-op-vm-3 <none> <none>
kube-system kube-proxy-w8z7t 1/1 Running 0 7m28s 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system kube-scheduler-joez-work-op-vm-2 1/1 Running 0 7m37s 10.67.103.75 joez-work-op-vm-2 <none> <none>
kube-system yurt-controller-manager-9d749b975-kvk6q 1/1 Running 0 5m23s 10.67.103.75 joez-work-op-vm-2 <none> <none>
box@joez-work-op-vm-2:~/share/repo/openyurt/openyurt-release$ kubectl get -n kube-system -o yaml Deployment/yurt-controller-manager | grep -A4 ' tolerations'
tolerations:
- operator: Exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that operator: "Exists"
toleration maybe added recently, so the above error is fixed. and operator: "Exists"
can tolerate all of taints, so it's more suitable for yurt-controller-manager.
- operator: "Exists" | ||
- key: "node-role.kubernetes.io/master" | ||
effect: "" | ||
operator: "Exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment the reason of adding key: "node-role.kubernetes.io/master"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous one
- operator: "Exists" | ||
- key: "node-role.kubernetes.io/master" | ||
effect: "" | ||
operator: "Exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment the reason of adding key: "node-role.kubernetes.io/master"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous one
@@ -0,0 +1,62 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can exec make release
command under openyurt repository to generate images of OpenYurt components. it's like that this file is a local test script. and how about remove it from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will remove it from this PR, it is kind of an enhancement, especially for local development, you can build and deliver, then load and deploy with one command
klog.Errorf("At least one cloud node should be provided!") | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check co.CloudNodes
in func Complete()
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right
My first attempt is to find out a master node if there is no one provided, yes keep it simple and explicit, and fail early
close it first to turn the PR into separated ones |
Ⅰ. Describe what this PR does
Ⅱ. Does this pull request fix one issue?
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews