Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to install openyurt on multiple control-plane and worker nodes? #1127

Closed
batthebee opened this issue Jan 8, 2023 · 8 comments · Fixed by #1158
Closed

[Question] How to install openyurt on multiple control-plane and worker nodes? #1127

batthebee opened this issue Jan 8, 2023 · 8 comments · Fixed by #1158
Labels
kind/question kind/question

Comments

@batthebee
Copy link
Contributor

What happened:

I am installing a kubeadm based openyurt cluster, thanks for the great doc.

In the documentation it says under Join Node that "You should only install node components of OpenYurt on nodes that have already been joined in the Kubernetes cluster."

What does that mean exactly?

I want my setup to be ha. for this I need three control plane nodes.
Should the openyurt components described under manually setup run on them? Currently these are scheduled on already joined worker nodes.

How do I handle worker nodes running general things like argocd/fleet/monitoring etc.? Can I just leave them as they are, or do they also need to be modified as described under Join Node?

Thanks a lot

Environment:

OpenYurt version: 1.1.0
Kubernetes version (use kubectl version): 1.22.17

/kind question

@batthebee batthebee added the kind/question kind/question label Jan 8, 2023
@rambohe-ch
Copy link
Member

@batthebee Thank you for raising issue.

Components of OpenYurt have two kinds: control-plane components(like yurt-controller-manager, yurt-app-manager yurt-tunnel-server etc.) and node components(like yurthub, yurt-tunnel-agent). and we recommended you to install control-plane components on cloud nodes that can connect with master nodes(hosted K8s control-plane components) by intranet network. moreover, you can use command yurtadm join xxxx --working-mode=cloud xxx to join cloud nodes into cluster.

For edge nodes that connect with master nodes by public network(like internet), we need to install node components(like yurthub, yurt-tunnel-agent) on them manually. or you can use yurtadm join xxx --working-mode=edge xxx to join edge nodes into cluster from scratch.

If there are 3 instances of kube-apiserver that yurthub need to connect, you can configure multiple addresses for yurthub as following(we assume the multiple addresses are: 1.2.3.3:5678, 1.2.4.4:5678, 1.2.5.5:5678):

$ cat config/setup/yurthub.yaml |
sed 's|__kubernetes_master_address__|1.2.3.3:5678,1.2.4.4:5678,1.2.5.5:5678|;
s|__bootstrap_token__|07401b.f395accd246ae52d|' > /tmp/yurthub-ack.yaml &&
scp -i <yourt-ssh-identity-file> /tmp/yurthub-ack.yaml root@us-west-1.192.168.0.88:/etc/kubernetes/manifests

and yurthub will use round-robin mode to access these instances in default, and you can configure the lb mode by --lb-mode parameter for yurthub.

  • rr: round-robin mode
  • priority: The previous address is accessed first, and if the previous address is not accessible, the subsequent address is accessed.

if you install node components(like yurthub) on the edge nodes manually, you need to recreate all pods(like argocd/fleet/monitoring) by kubectl delete pod xxx command.

@batthebee
Copy link
Contributor Author

batthebee commented Jan 9, 2023

Hi @rambohe-ch thanks for the detailed answer. I am still a bit confused though.

In the description of the manual setup it assumes an existing node.

This is labeled with "openyurt.io/is-edge-worker=false" and called "cloud node". This is then however also still into a node group "master" taken up. Furthermore yurt-tunnel-dns, yurt-app-manager and openyurt are deployed to this node.

Under 4 there is this note:
"The above operation is only for the Master node, if there are other nodes in the cluster, additional adjustment is needed, the operation method can be referred to Install OpenYurt Node on Existing K8s Nodes."

There is no master node at Architecture.

Now what confuses me about the instructions is that I don't quite understand how to proceed. Here on the "master" node first of all all control plane components are deployed?!

According to your description, I would deploy and modify the control plane first, but not install a yurt-tunnel etc yet.
After that I would join a cloud worker via yurtadm, on which I then roll out the control plane components. After that I could join an edge worker. Is that right?

And another question: What is the purpose of cloud nodes? Are they just normal "worker nodes"? Why do they also have to be connected via YurtHub?

For context, why I don't just use yurtadm: I want to implement this in a second step under k3s/rke2 and for that I need to understand what exactly needs to be adjusted where.

@rambohe-ch
Copy link
Member

Hi @rambohe-ch thanks for the detailed answer. I am still a bit confused though.

In the description of the manual setup it assumes an existing node.

This is labeled with "openyurt.io/is-edge-worker=false" and called "cloud node". This is then however also still into a node group "master" taken up. Furthermore yurt-tunnel-dns, yurt-app-manager and openyurt are deployed to this node.

Under 4 there is this note: "The above operation is only for the Master node, if there are other nodes in the cluster, additional adjustment is needed, the operation method can be referred to Install OpenYurt Node on Existing K8s Nodes."

There is no master node at Architecture.

Now what confuses me about the instructions is that I don't quite understand how to proceed. Here on the "master" node first of all all control plane components are deployed?!

According to your description, I would deploy and modify the control plane first, but not install a yurt-tunnel etc yet. After that I would join a cloud worker via yurtadm, on which I then roll out the control plane components. After that I could join an edge worker. Is that right?

And another question: What is the purpose of cloud nodes? Are they just normal "worker nodes"? Why do they also have to be connected via YurtHub?

For context, why I don't just use yurtadm: I want to implement this in a second step under k3s/rke2 and for that I need to understand what exactly needs to be adjusted where.

@batthebee Thanks for your kindly response.

  1. The architecture image should be updated and master nodes should be specified explicitly. and we plan to update architecture when raven and yurt-tunnel are merged completely.

  2. yes, we should install K8s control-plane at first on master nodes, then join cloud nodes by yurtadm join command(of course, you can use master nodes as cloud nodes by labeling them) and install control-plane components of OpenYurt. in the end, you can join edge nodes by yurtadm join command.

  3. yurthub take the responsibility to provide service topology capability, so we recommend to install yurthub component on cloud nodes.

@batthebee
Copy link
Contributor Author

batthebee commented Jan 12, 2023

@rambohe-ch I still have a few problems with the manual setup. After a restart of the control plane, the kube-proxy can no longer start on the control plane.

Currently I have one control plane and one worker. The worker runs so far and can also reach the kube api, the control plane no longer. I'm using ubuntu 20.04.

This is how the logs look like:

running pods:

k get po -A -o wide
NAMESPACE      NAME                                          READY   STATUS              RESTARTS        AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
kube-flannel   kube-flannel-ds-47hzg                         1/1     Running             1 (15m ago)     8h      143.42.17.169   kubeadm-openyurt-w1   <none>           <none>
kube-flannel   kube-flannel-ds-zmnpd                         0/1     CrashLoopBackOff    49 (117s ago)   4h13m   143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    coredns-jzhtk                                 1/1     Running             1 (15m ago)     8h      10.244.1.11     kubeadm-openyurt-w1   <none>           <none>
kube-system    coredns-kgtm2                                 0/1     ContainerCreating   0               4h12m   <none>          kubeadm-openyurt-cp   <none>           <none>
kube-system    etcd-kubeadm-openyurt-cp                      1/1     Running             1 (8h ago)      8h      143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    kube-apiserver-kubeadm-openyurt-cp            1/1     Running             2 (8h ago)      8h      143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    kube-controller-manager-kubeadm-openyurt-cp   1/1     Running             2 (8h ago)      8h      143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    kube-proxy-cqbrp                              1/1     Running             1 (10m ago)     8h      143.42.17.169   kubeadm-openyurt-w1   <none>           <none>
kube-system    kube-proxy-ktj6v                              1/1     Running             0               6m59s   143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    kube-scheduler-kubeadm-openyurt-cp            1/1     Running             2 (8h ago)      8h      143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    yurt-app-manager-846cd4d98b-s2hr5             1/1     Running             7 (15m ago)     8h      10.244.1.9      kubeadm-openyurt-w1   <none>           <none>
kube-system    yurt-controller-manager-7f9fbdf99c-58fd4      1/1     Running             1 (8h ago)      8h      143.42.17.168   kubeadm-openyurt-cp   <none>           <none>
kube-system    yurt-hub-kubeadm-openyurt-w1                  1/1     Running             1 (15m ago)     8h      143.42.17.169   kubeadm-openyurt-w1   <none>           <none>
kube-system    yurt-tunnel-dns-9cbd69765-pcrfg               1/1     Running             1 (15m ago)     4h16m   10.244.1.10     kubeadm-openyurt-w1   <none>           <none>
kube-system    yurt-tunnel-server-5b9955c8c8-75rtm           0/1     CrashLoopBackOff    49 (45s ago)    4h15m   143.42.17.168   kubeadm-openyurt-cp   <none>           <none>

for worker:

k logs -n kube-system kube-proxy-cqbrp
I0112 15:44:40.299147       1 server.go:553] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
I0112 15:44:42.021345       1 node.go:172] Successfully retrieved node IP: 143.42.17.169
I0112 15:44:42.021381       1 server_others.go:140] Detected node IP 143.42.17.169
W0112 15:44:42.021400       1 server_others.go:565] Unknown proxy mode "", assuming iptables proxy
I0112 15:44:42.085770       1 server_others.go:206] kube-proxy running in dual-stack mode, IPv4-primary
I0112 15:44:42.085819       1 server_others.go:212] Using iptables Proxier.
I0112 15:44:42.085831       1 server_others.go:219] creating dualStackProxier for iptables.
W0112 15:44:42.085847       1 server_others.go:495] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6
I0112 15:44:42.089646       1 server.go:649] Version: v1.22.17
I0112 15:44:42.090451       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0112 15:44:42.090466       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0112 15:44:42.090658       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0112 15:44:42.090860       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0112 15:44:42.090895       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0112 15:44:42.095681       1 config.go:315] Starting service config controller
I0112 15:44:42.095701       1 shared_informer.go:240] Waiting for caches to sync for service config
I0112 15:44:42.095721       1 config.go:224] Starting endpoint slice config controller
I0112 15:44:42.095724       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0112 15:44:42.196455       1 shared_informer.go:247] Caches are synced for endpoint slice config 
I0112 15:44:42.196504       1 shared_informer.go:247] Caches are synced for service config 

for control plane:

k logs -n kube-system kube-proxy-wl2jq
I0112 15:56:16.140858       1 server.go:553] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config.
E0112 15:56:46.178236       1 node.go:161] Failed to retrieve node info: Get "https://10.96.0.1:443/api/v1/nodes/kubeadm-openyurt-cp": dial tcp 10.96.0.1:443: i/o timeout

Do you have an idea, why my control plane cannot reach api server anymore?

i think this is because in the removed kubeconfig the api server is entered by public ip and therefore resolvable. Only how should I deal with it? two daemonsets for the kube-proxy, one for the contol-planes and one for the rest?

@rambohe-ch
Copy link
Member

@batthebee Thank you for your kindly response. it looks like that kube-proxy/coredns/flannel can not work on master node with InClusterConfig. so how about take the kube-proxy/coredns/flannel pods out of master node?

@batthebee
Copy link
Contributor Author

@rambohe-ch I'm not really deep into the topic, but I think the situation is the following:

flannel and coredns are not running because of the missing firewall rules, which are normally set by kube-proxy.

As soon as I add the kubeconfig again and restart kube-proxy, iptables is filled. After that I can switch to inCluster and everything runs as usual after pod reboots until I reboot the master node.

So for the setup to work, there must be a way to resolve the 10.96.0.1 on the control plane/master independently of the inCluster kube-proxy, e.g. with another kube-proxy. how would you solve this?

What I do not quite understand is why the problem does not occur on the other nodes. Do you have any idea?

@rambohe-ch
Copy link
Member

@rambohe-ch I'm not really deep into the topic, but I think the situation is the following:

flannel and coredns are not running because of the missing firewall rules, which are normally set by kube-proxy.

As soon as I add the kubeconfig again and restart kube-proxy, iptables is filled. After that I can switch to inCluster and everything runs as usual after pod reboots until I reboot the master node.

So for the setup to work, there must be a way to resolve the 10.96.0.1 on the control plane/master independently of the inCluster kube-proxy, e.g. with another kube-proxy. how would you solve this?

What I do not quite understand is why the problem does not occur on the other nodes. Do you have any idea?

@batthebee Thank you for your patient reply.

What I do not quite understand is why the problem does not occur on the other nodes. Do you have any idea?

because the filter masterservice in yurthub has mutated the service address of default/kubernetes to yurthub address, so kube-proxy will go through yurthub to accesses cloud kube-apiserver. you can dive into the details as following:

So for the setup to work, there must be a way to resolve the 10.96.0.1 on the control plane/master independently of the inCluster kube-proxy, e.g. with another kube-proxy. how would you solve this?

because no yurthub component was deployed on master node, so kube-proxy can not access kube-apiserver by InClusterConfig, so we need to keep kube-proxy to use kubeconfig unchanged, then kube-proxy on master node can access kube-apiserver by kubeconfig.

but for kube-proxy on nodes with yurthub, we want these kube-proxy to use InClusterConfig in order to make kube-proxy access kube-apiserver through yurthub, so kube-proxy will share the yurthub capabilities like local cache, service topology etc.

The solution is: we will add a new filter in Yurthub that used to comment the kubeconfig configuration in kube-system/kube-proxy configmap, this means kube-proxy on nodes with Yurthub will use InClusterConfig to access kube-apiserver, and kube-proxy on nodes without Yurthub(like master node) will use kubeconfig to access kube-apiserver.

@rambohe-ch
Copy link
Member

@batthebee I have added a inclusterconfig filter to handle this case in pull request: #1158 . If you have interest, please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question kind/question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants