Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kubeadm deploy HA kubernetes cluster #328

Closed
cookeem opened this issue Jul 1, 2017 · 45 comments
Closed

Make kubeadm deploy HA kubernetes cluster #328

cookeem opened this issue Jul 1, 2017 · 45 comments
Labels
area/HA documentation/better-examples priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@cookeem
Copy link

cookeem commented Jul 1, 2017

/kind feature

kubeadm now is not support HA, so we can not use kubeadm to setup a production kubernetes cluster. But create a HA cluster from scratch is too complicated, and when I google keyword "kubeadm HA", only few article or mind draft related tell me how to.

So I try lots of ways to reform "kubeadm init", finally I make kubeadm cluster support HA, and I hope this way will help "kubeadm init" support creating a HA production cluster.

Detail operational guidelines is here: https://github.com/cookeem/kubeadm-ha

Summary

  • Linux version: CentOS 7.3.1611

  • docker version: 1.12.6

  • kubeadm version: v1.6.4

  • kubelet version: v1.6.4

  • kubernetes version: v1.6.4

  • Hosts list

HostName IPAddress Notes Components
k8s-master1 192.168.60.71 master node 1 keepalived, nginx, etcd, kubelet, kube-apiserver, kube-scheduler, kube-proxy, kube-dashboard, heapster
k8s-master2 192.168.60.72 master node 2 keepalived, nginx, etcd, kubelet, kube-apiserver, kube-scheduler, kube-proxy, kube-dashboard, heapster
k8s-master3 192.168.60.73 master node 3 keepalived, nginx, etcd, kubelet, kube-apiserver, kube-scheduler, kube-proxy, kube-dashboard, heapster
N/A 192.168.60.80 keepalived virtual IP N/A
k8s-node1 ~ 8 192.168.60.81 ~ 88 8 worker nodes kubelet, kube-proxy
  • Detail deployment architecture

k8s ha

Critical steps

  • 1. Deploy an independent etcd tls cluster on all master nodes

  • 2. On k8s-master1: use kubeadm init create master connect independent etcd tls cluster

$ cat /root/kubeadm-ha/kubeadm-init.yaml 
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.6.4
networking:
  podSubnet: 10.244.0.0/16
etcd:
  endpoints:
  - http://192.168.60.71:2379
  - http://192.168.60.72:2379
  - http://192.168.60.73:2379

$ kubeadm init --config=/root/kubeadm-ha/kubeadm-init.yaml
  • 3. Copy k8s-master1 /etc/kubernetes directory to k8s-master2 and k8s-master3

  • 4. Use ca.key and ca.crt re-create all master nodes' apiserver.key and apiserver.crt certificates

Modify apiserver.crt X509v3 Subject Alternative Name DNS and IP to current hostname and IP address, and add keepalived virtual IP address.

  • 5. Edit all master nodes' admin.conf controller-manager.conf scheduler.conf, replace server point to current IP address

  • 6. Setup keepalived, and create a virtual IP redirect to all master nodes

  • 7. Setup nginx as all master apiserver's load balancer

  • 8. Update configmap/kube-proxy, replace server point to virtual IP apiserver's load balancer

How to make kubeadm support HA

  • **1. We can presume our kubeadm init config file like this: **
$ cat /root/kubeadm-ha/kubeadm-init.yaml 
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.6.4
networking:
  podSubnet: 10.244.0.0/16
ha:
  # this settings is current IP address  
  ip: 192.168.60.71
  # this settings is keepalived virtual IP address
  vip: 192.168.60.80
  # this settings is master nodes' IP address list.
  # 1. kubeadm init use this info to create apiserver.crt and apiserver.key files. 
  # 2. And use this settings to create an etcd tls cluster pods. 
  # 3. And use this settings to create nginx load balancer pods.
  # 4. And use this settings to create keepalived virtual ip.
  masters:
  - 192.168.60.71
  - 192.168.60.72
  - 192.168.60.73
  • 2. On k8s-master1 we use kubeadm init --config=/root/kubeadm-ha/kubeadm-init.yaml create a master node

kubeadm will create etcd/nginx/keepalived pods and all certificates and *.conf files.

  • 3. On k8s-master1 copy /etc/kubernetes/pki directory to k8s-master2 and k8s-master3

  • 4. On k8s-master2 and k8s-master3 replace kubeadm-init.yaml ha.ip settings to current IP address

  • 5. On k8s-master2 and k8s-master3 we use kubeadm init --config=/root/kubeadm-ha/kubeadm-init.yaml create 2 master nodes

kubeadm will create etcd/nginx/keepalived pods and all certificates and *.conf files, then k8s-master2 and k8s-master3 will join the HA cluster automatically.

@luxas
Copy link
Member

luxas commented Jul 5, 2017

@cookeem Hi! Are you interested in attending out meetings?
Would be great to discuss this more there and also you can demo this project if you want...

@luxas luxas added area/HA documentation/better-examples kind/enhancement priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jul 5, 2017
@cookeem
Copy link
Author

cookeem commented Jul 6, 2017

It looks like the meeting was finished already :(

@luxas
Copy link
Member

luxas commented Jul 6, 2017

@cookeem They are organized every week. Come and join us the next time on Tuesday https://docs.google.com/document/d/1deJYPIF4LmhGjDVaqrswErIrV7mtwJgovtLnPCDxP7U/edit#heading=h.3lizw9e2c8mi :)

@cookeem
Copy link
Author

cookeem commented Jul 6, 2017

@luxas got it, thx :)

@jamiehannaford
Copy link
Contributor

@cookeem This is awesome, thanks for submitting this issue! We're making HA a priority for 1.9 and we have #261 to track that work. If it's okay with you, can we close this issue in favour of that instead? Feel free to add any missing context.

@cookeem cookeem closed this as completed Oct 18, 2017
@kumarganesh2814
Copy link

@cookeem
Hi

I am also strugeeling to create HA Cluster for Centos

I setup my cluster as below attached file, Can you please advise how I can modify current cluster now?

I want 2 Master and 1 Worker just to showcase to management to get approval and get kubernetes as standard to replace docker.

Please help me

Best Regards
Ganesh

@kumarganesh2814
Copy link

@cookeem
Copy link
Author

cookeem commented Nov 17, 2017

@kumarganesh2814 you can follow my instruction:
https://github.com/cookeem/kubeadm-ha

But I think 2 master nodes is not a good idea, master nodes number should be odd and greater than 1, for example 3 master nodes.

@kumarganesh2814
Copy link

@cookeem
Sure let me try this thanks for guidance

@kumarganesh2814
Copy link

@cookeem
Hi
I am facing many issue while setting up this HA cluster using the instruction which you provided on
https://github.com/cookeem/kubeadm-ha

Earlier I was able to make HA cluster up some how while scaling any app or deplying was getting errors

Issues Such as


Nov 20 22:54:35 kuber-poc-app2 kubelet[55089]: E1120 22:54:35.271761   55089 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.127.38.18:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkuber-poc-app2&resourceVersion=0: dial tcp 10.127.38.18:6443: getsockopt: connection refused

@kumarganesh2814
Copy link

Now I started this setup yet again and now I see after many hours node still not joined

NAME             STATUS    AGE       VERSION
kuber-poc-app1   Ready     48m       v1.8.4

```dockerd: time="2017-11-21T08:58:35.544063400-08:00" level=info msg="memberlist: Suspect 48ba0858e468 has failed, no acks received"
Nov 21 08:58:35 kuber-poc-app2 dockerd: time="2017-11-21T08:58:35.709865691-08:00" level=info msg="Node join event for f0c95ccc5a52/

```Nov 21 08:38:29 kuber-poc-app2 kubelet[9321]: I1121 08:38:29.124901    9321 controller.go:118] kubelet config controller: validating combination of defaults and flags
Nov 21 08:38:29 kuber-poc-app2 kubelet[9321]: I1121 08:38:29.136009    9321 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Nov 21 08:38:29 kuber-poc-app2 kubelet[9321]: I1121 08:38:29.136043    9321 client.go:95] Start docker client with request timeout=2m0s
Nov 21 08:38:29 kuber-poc-app2 kubelet[9321]: W1121 08:38:29.138029    9321 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d

@kumarganesh2814
Copy link

@cookeem
Any Idea for this error

I1121 19:06:50.244938       1 controllermanager.go:109] Version: v1.8.4
I1121 19:06:50.249200       1 leaderelection.go:174] attempting to acquire leader lease...
E1121 19:06:50.249837       1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.127.38.20:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.127.38.20:6443: getsockopt: connection refused
E1121 19:06:53.701734       1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get https://10.127.38.20:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 10.127.38.20:6443: getsockopt: connection refused

@cookeem
Copy link
Author

cookeem commented Nov 22, 2017

@kumarganesh2814 It seems there is something wrong with the certificates. Before setup again, do you tear down your cluster first?

@kumarganesh2814
Copy link

@cookeem

Hi

Yes I removed all and reset using kubeadm reset

on all 3 master
rm -rf /var/lib/cni
rm -rf /run/flannel
rm -rf /etc/cni
ifconfig cni0 down
brctl delbr cni0

But again after reboot of VM and cleaning some files I am able to get back CLuster Now. Thanks you very much for your Help till now. Having few issue on VIP

kind: MasterConfiguration
kubernetesVersion: v1.8.4
networking:
  podSubnet: 10.244.0.0/16

apiServerCertSANs:
- k8s-master1
- k8s-master2
- k8s-master3
- 10.127.xxxx
- 10.127.xxxx
- 10.127.xxxx
- 10.127.xxxx
etcd:
  endpoints:
  - http://10.127.xxxx:2379
  - http://10.127.xxxx:2379
  - http://10.127.xxxx:2379```

Some how this VIP is not working for load balance, I mean when I trying to access app through VIP url it doesnt work :(

Cheking for issue

Best Regards
Ganesh

@kumarganesh2814
Copy link

curl: (7) Failed connect to VIP_IP:30000; Connection refused

# curl -L MASTER-1-IP:30000
 <!doctype html> <html ng-app="kubernetesDashboard"> <head> <meta charset="utf-8"> <title ng-controller="kdTitle as $ctrl" ng-bind="$ctrl.title()"></title> <link rel="icon" type="image/png" href="assets/images/kubernetes-logo.png"> <meta name="viewport" content="width=device-width"> <link rel="stylesheet" href="static/vendor.803608cb.css"> <link rel="stylesheet" href="static/app.336a76b4.css"> </head> <body> <!--[if lt IE 10]>
      <p class="browsehappy">You are using an <strong>outdated</strong> browser.
      Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your
      experience.</p>
    <![endif]--> <kd-chrome layout="column" layout-fill> </kd-chrome> <script src="static/vendor.31531c85.js"></script> <script src="api/appConfig.json"></script> <script src="static/app.f69f96ab.js"></script> </body> </html> 
	
	# curl -L MASTER-2-IP:30000
 <!doctype html> <html ng-app="kubernetesDashboard"> <head> <meta charset="utf-8"> <title ng-controller="kdTitle as $ctrl" ng-bind="$ctrl.title()"></title> <link rel="icon" type="image/png" href="assets/images/kubernetes-logo.png"> <meta name="viewport" content="width=device-width"> <link rel="stylesheet" href="static/vendor.803608cb.css"> <link rel="stylesheet" href="static/app.336a76b4.css"> </head> <body> <!--[if lt IE 10]>
      <p class="browsehappy">You are using an <strong>outdated</strong> browser.
      Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your
      experience.</p>
    <![endif]--> <kd-chrome layout="column" layout-fill> </kd-chrome> <script src="static/vendor.31531c85.js"></script> <script src="api/appConfig.json"></script> <script src="static/app.f69f96ab.js"></script> </body> </html> 
	
	# curl -L MASTER-3-IP:30000
 <!doctype html> <html ng-app="kubernetesDashboard"> <head> <meta charset="utf-8"> <title ng-controller="kdTitle as $ctrl" ng-bind="$ctrl.title()"></title> <link rel="icon" type="image/png" href="assets/images/kubernetes-logo.png"> <meta name="viewport" content="width=device-width"> <link rel="stylesheet" href="static/vendor.803608cb.css"> <link rel="stylesheet" href="static/app.336a76b4.css"> </head> <body> <!--[if lt IE 10]>
      <p class="browsehappy">You are using an <strong>outdated</strong> browser.
      Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your
      experience.</p>
    <![endif]--> <kd-chrome layout="column" layout-fill> </kd-chrome> <script src="static/vendor.31531c85.js"></script> <script src="api/appConfig.json"></script> <script src="static/app.f69f96ab.js"></script> </body> </html>

@kumarganesh2814
Copy link

@cookeem
Can you please suggest me How this VIP need to be created, I requested our loadbalancer team to create a VIP pointed to 3 IP of Master but they say its passing the healthCheck. What IP and Port i can give them?

Best Regards
Ganesh

@kumarganesh2814
Copy link

[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.11.0-ce. Max validated version: 17.03
[validation] WARNING: using token-based discovery without DiscoveryTokenCACertHashes can be unsafe (see https://kubernetes.io/docs/admin/kubeadm/#kubeadm-join).
[validation] WARNING: Pass --discovery-token-unsafe-skip-ca-verification to disable this warning. This warning will become an error in Kubernetes 1.9.
[discovery] Trying to connect to API Server "VIP_IP:8443"
[discovery] Created cluster-info discovery client, requesting info from "https://VIP_IP:8443"
[discovery] Failed to request cluster info, will try again: [Get https://VIP_IP:8443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp VIP_IP:8443: getsockopt: connection refused]

@kumarganesh2814
Copy link


I see nginx container running in all 3 Master Node 

@kumarganesh2814
Copy link

a97be84aac4e nginx "nginx -g 'daemon of…" 24 hours ago Up 15 hours 80/tcp, 0.0.0.0:8443->8443/tcp nginx-lb

@cookeem
Copy link
Author

cookeem commented Nov 23, 2017

@kumarganesh2814 Sorry for reply late. So your problem is how to use keepalived to create a VIP right? If your cluster is publish in internet, I think each host has two ip address(one is internet IP address, the other is ethernet IP address for cluster communication), so you should make a new ethernet IP address for keepalived VIP first, and make sure this VIP can access by each nodes.

@kumarganesh2814
Copy link

@cookeem
Thanks A lot Sir.....
My Cluster is internal to Compay.(Baremetal CentOS VM)

I will try to follow info, you been kind enough to answer all my qyeries really appericiate this.

Best Regards
Ganesh Kumar

@kumarganesh2814
Copy link

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:01:30:2d brd ff:ff:ff:ff:ff:ff
    inet 10.127.68.68/24 brd 10.127.68.255 scope global ens192
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe01:302d/64 scope link
       valid_lft forever preferred_lft forever
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:46:0e:20:72 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:46ff:fe0e:2072/64 scope link
       valid_lft forever preferred_lft forever
28: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether 6e:6c:99:b4:97:f8 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::6c6c:99ff:feb4:97f8/64 scope link
       valid_lft forever preferred_lft forever
29: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP qlen 1000
    link/ether 0a:58:0a:f4:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::f018:8cff:fe7e:11a8/64 scope link
       valid_lft forever preferred_lft forever

@kumarganesh2814
Copy link

kumarganesh2814 commented Nov 23, 2017

@cookeem

Can you please also advise on which all port I should ask my network team to Configure for this VIP.

Is this offloading of certificates of 8443 port done on f5 (VIP) or on the Master Nodes?

Best Regards
Ganesh

@kumarganesh2814
Copy link

@cookeem
Hi

I have chaged port from 8443 to 8080 for nginx continer and then joined 3 nodes worked fine and now I see app is accessible from all 3 Master IP

Thanks for your support cook you Rock !!!

Best Regards
Ganesh Kumar

@cookeem
Copy link
Author

cookeem commented Nov 24, 2017

@kumarganesh2814 Great, you are welcome 😁

@kumarganesh2814
Copy link

kumarganesh2814 commented Nov 24, 2017

@cookeem

Sorry to Trouble you again.

In CLuster which I have deployed I see one starge this that I can access my app on all Master and Worker Node

However I havent specified hostport true is this is bug/issue or a feature.

my Yaml for Sample App

kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  labels:
    k8s-app: tomcat
  name: tomcat
  namespace: tomcat
spec:
  replicas: 3
#  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: tomcat
  template:
    metadata:
      labels:
        k8s-app: tomcat
    spec:
      containers:
      - name: tomcat
        image: ganeshdevops10/tomcatserver-gse:version1
        imagePullPolicy: Never
        ports:
        - containerPort: 8080
          protocol: TCP
        livenessProbe:
          httpGet:
            scheme: HTTP
            path: /
            port: 8080
          initialDelaySeconds: 30
          timeoutSeconds: 30```


```# cat tomcat-svc.yaml
apiVersion: v1
kind: Service
metadata:

  labels:
    k8s-app: tomcat
  name: tomcat-svc
  namespace: tomcat

spec:

  ports:
  - nodePort: 30006
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    k8s-app: tomcat
  sessionAffinity: None
  type: NodePort```

@kumarganesh2814
Copy link

So Now after Kubeclt apply yaml files
I see can Access app via 6 diff urls
3 url for MASTER-IP:30006/APP
3 url for Worker-IP:30006/APP

But as far as I know we can only access app via Master Node IP, not sure why worker also showing load

I see this Process on Worker Node

netstat -plant|grep 30006

tcp6 0 0 :::30006 :::* LISTEN 1972/kube-proxy

kubectl get nodes output

NAME             STATUS                     AGE       VERSION
kuber-poc-app1   Ready,SchedulingDisabled   2d        v1.8.4
kuber-poc-app2   Ready,SchedulingDisabled   2d        v1.8.4
kuber-poc-app3   Ready,SchedulingDisabled   2d        v1.8.4
kuber-poc-app4   Ready                      23h       v1.8.4
kuber-poc-app5   Ready                      23h       v1.8.4
kuber-poc-app6   Ready                      23h       v1.8.4```

@cookeem
Copy link
Author

cookeem commented Nov 27, 2017

This is kubernetes proxy & nodeport features, there's proxy process will boot on every node, it will make all nodes can access the same nodeport.

This is the document about NodePort:
https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport

@kumarganesh2814
Copy link

@cookeem
Thanks Now got clarity One more Doubt if all our Masters will go down will we be able to access pods using same nodeport.

Or If I enable hostnetwork to true I can access same service on specified port if I shut all 3 Masters still pods will be running?

Best Regards
Ganesh

@cookeem
Copy link
Author

cookeem commented Nov 28, 2017

@kumarganesh2814 NodePort is support by kubernetes proxy component, if all masters are down, although docker container still running, but all nodeport will unavailable and network function will lost, you can not access any service from your host.

@kumarganesh2814
Copy link

@cookeem

Thanks Man.

Appericiate your Support till.

Best Regards
Ganesh

@vishalraghu
Copy link

vishalraghu commented Dec 8, 2017

Hi @cookeem

Thanks for your solution.
I have few questions regarding setup.

1.After setting cluster in HA, how can cluster be upgraded ? Like say this setup is done using kubernetes 1.8.x version and yesterday 1.9 is released, what will be simple way to upgrade ?

2.If one of master goes down will there be any impact on pods deployed nodes ?

3.Can this setup be done in production?

Regards,
Vishal

@cookeem
Copy link
Author

cookeem commented Dec 9, 2017

@vishalraghu

  1. I never try upgrade by myself, but here's the document upgrade from 1.7 to 1.8: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-1-8/
    You can try and feel free to tell me how it work in HA k8s.

  2. No, everything will works.

  3. Now I use this deployment in production mode, it works fine.

@sv01a
Copy link

sv01a commented Dec 9, 2017

@cookeem wrap your solution in ansible playbook(https://github.com/sv01a/ansible-kubeadm-ha-cluster)

@kcao3
Copy link

kcao3 commented Dec 9, 2017

@cookeem, @sv01a Have you ever tried to add new masters to your existing cluster? If yes, were you able to do so with your approach?

@sv01a
Copy link

sv01a commented Dec 9, 2017

@kcao3 do you mean convert existing non-ha kubeadm cluster to ha cluster? If yes, I don't try.

@kcao3
Copy link

kcao3 commented Dec 9, 2017

@sv01a No. Once you deployed an HA cluster using your ansible playbooks, how do you expand it- i.e. adding one or more masters to your current HA cluster? Do you use the same certs on each master?

@sv01a
Copy link

sv01a commented Dec 10, 2017

With current playbook implementation it's not possible, because playbook will re-init cluster on second run. Honestly i don't think about this case.

But you can add master by hand, simply repeat steps for master from instructions.

@kumarganesh2814
Copy link

@cookeem

Hi Man,

Sorry to keep you bugging you closed loop, but I guess this will help other too.

So new issue is that well set Master Node which was working absoulutely fine. After reboot nothing works fine.

Services which were able to access via Master1 (rebooted VM) now not accessible. I tried to recreate dns and flannel pod again but still same.

Only Message I see

Dec 12 09:35:59 [localhost] kubelet: W1212 09:35:59.621337   29750 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available

But not get much info

# kubectl get po -n kube-system -o wide|grep kuber-poc-app1|grep kube
heapster-5b88547996-8zgv5                1/1       Running   4          21d       10.244.0.26    kuber-poc-app1
kube-apiserver-kuber-poc-app1            1/1       Running   4          21d       10.127.38.18   kuber-poc-app1
kube-controller-manager-kuber-poc-app1   1/1       Running   6          21d       10.127.38.18   kuber-poc-app1
kube-dns-545bc4bfd4-5f4bk                3/3       Running   0          20m       10.244.0.27    kuber-poc-app1
kube-flannel-ds-4qm5k                    1/1       Running   0          27m       10.127.38.18   kuber-poc-app1
kube-proxy-fxdtc                         1/1       Running   5          21d       10.127.38.18   kuber-poc-app1
kube-scheduler-kuber-poc-app1            1/1       Running   6          21d       10.127.38.18   kuber-poc-app1

Did you this happens to your env also other on one other cluster I see if kubeadm/kubectl version is upgrade then nodes become NotReady and pods goes to Unknown state

So these 2 Issue how we address in HA setup.

  1. After reboot Master VM service not accessible
  2. How do we update kubeadm/kubectl/kubelet with no downtime for pod.

Best Regards
Ganesh

@cookeem
Copy link
Author

cookeem commented Dec 14, 2017

@kumarganesh2814

  1. Check your keepalived and kube-proxy's logs.
  2. kubeadm is useless after HA cluster setup. kubectl just a client, whenever you want to upgrade is fine. I don't think upgrade kubelet the HA cluster will still work, in the official document, you can use kubeadm upgrade to upgrade your cluster, but I don't think it will work for HA cluster.

@kumarganesh2814
Copy link

Hi @cookeem

I got the issue was change in iptable rule which was introduced as a workaround when installation was done.

To refrence here so others can benifit in such case.

I check iprable rules on VM which run fine and one which fails for connection

Diff
< :INPUT ACCEPT [2:156]
< :FORWARD ACCEPT [0:0]
< :OUTPUT ACCEPT [2:156]

:INPUT ACCEPT [127:124440]
:FORWARD DROP [6:492]
:OUTPUT ACCEPT [124:137662]

So I executed two commands
iptables -P FORWARD ACCEPT
sysctl net.bridge.bridge-nf-call-iptables=1 (Not necessary but it was done earlier )

now save iptable by executing below command
iptables-save

Now check service with Master1 IP all looks fine :-)

# kubectl logs kube-proxy-lmttc -n kube-system
W1214 07:17:10.714675       1 server.go:191] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
W1214 07:17:10.725837       1 server_others.go:268] Flag proxy-mode="" unknown, assuming iptables proxy
I1214 07:17:10.727409       1 server_others.go:122] Using iptables Proxier.
I1214 07:17:10.738582       1 server_others.go:157] Tearing down inactive rules.
E1214 07:17:10.790432       1 proxier.go:699] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)
I1214 07:17:10.802732       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1214 07:17:10.802813       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1214 07:17:10.802930       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1214 07:17:10.802969       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1214 07:17:10.810824       1 config.go:202] Starting service config controller
I1214 07:17:10.810964       1 controller_utils.go:1041] Waiting for caches to sync for service config controller
I1214 07:17:10.811024       1 config.go:102] Starting endpoints config controller
I1214 07:17:10.811032       1 controller_utils.go:1041] Waiting for caches to sync for endpoints config controller
I1214 07:17:10.911160       1 controller_utils.go:1048] Caches are synced for endpoints config controller
I1214 07:17:10.911273       1 controller_utils.go:1048] Caches are synced for service config controller

I still see above error but not sure if has any impact as I am able to access service.

Best Regards
Ganesh

@nelsonfassis
Copy link

@kumarganesh2814 According to the official documentation, Kubernetes does not work well with firewall on CentOS, so you should stop and disable your firewalld:

https://kubernetes.io/docs/getting-started-guides/centos/centos_manual_config/

Will most likely fix your problem.

@cookeem
Copy link
Author

cookeem commented Jan 13, 2018

@nelsonfassis
Copy link

I am having issues to join the master02 and master03 after creating master01.
After running kubeadm init --config=config.yaml, pointing to an external etcd cluster with its required certificates and all, I copied the /etc/kubernetes from master01 to master02 and master03.
I edited and the .conf and manifest configs to the respective host ip and still, whenever I check the nodes, there is only the master01.

What I'm missing?

I'm using 1.8.

@nelsonfassis
Copy link

never mind, didn't see this detail.:
"on k8s-master1: edit kube-apiserver.yaml file's admission-control settings, v1.7.0 use NodeRestriction admission control will prevent other master join the cluster, please reset it to v1.6.x recommended config."

My masters are now joining the cluster. Roles is still like NONE, but I will keep working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/HA documentation/better-examples priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

8 participants