Docker downgrade fails #28

peykens · 2017-09-28T20:37:48Z

Hi,

I'm trying to run this great ansible scripts, but the docker downgrade always fails.

RUNNING HANDLER [kubernetes : restart docker] ***************************************************************************************************************************************
fatal: [192.168.1.200]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service docker: Job for docker.service failed. See 'systemctl status docker.service' and 'journalctl -xn' for details.\n"}

Anyone else also facing this issue ?

rhuss · 2017-10-02T21:26:17Z

Could be that something upstream has changed. I'm going to try to upgrade to Kubernetes 1.8 soon, so will verify this, too.

kerko · 2017-10-02T21:55:08Z

I just encountered the same problem. Seems to related to wrong systemd config. In roles/kubernetes/tasks/docker.yml, the line:
template: src=docker-1.12.service dest=/etc/systemd/system/multi-user.target.wants/docker.service
needs to be
template: src=docker-1.12.service dest=/etc/systemd/system/docker.service

rhuss · 2017-10-04T21:11:17Z

@kerko cool, thanks for the pointer ! weekend is raspi time, will fix it then.

peykens · 2017-10-07T06:18:43Z

I have tested in the mean time with Kubernetes v 1.8.0 and docker version 17.05.0-ce.

You have to update the iptables for cni0, and than it works.
$ sudo iptables -A FORWARD -i cni0 -j ACCEPT
$ sudo iptables -A FORWARD -o cni0 -j ACCEPT

BUT : I'm now hitting an issue that the server doesn't store the JWS key. So after 24 horus (ttl set to 0 doesn't help) you loose the ability to join. When a worker node reboots, it's lost. When the master reboots everything is gone.
log message : "there is no JWS signed token in the cluster-info ConfigMap"

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:46:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/arm"}

$ docker version
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:30:54 2017
OS/Arch: linux/arm

Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:30:54 2017
OS/Arch: linux/arm
Experimental: false

rhuss · 2017-10-09T11:12:56Z

I'm about to update to 1.8.0 and just checked the minimal Docker version to use:

From https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md#external-dependencies:

Continuous integration builds use Docker versions 1.11.2, 1.12.6, 1.13.1, and 17.03.2. These versions were validated on Kubernetes 1.8.

So I will go with 17.03.2 if available for Hypriot and use that version for the next update.

peykens · 2017-10-09T17:38:46Z

Super, i’m looking forward to your results. I really want to get it working.

rhuss · 2017-10-10T08:17:37Z

I'm just about to push, but it turns out that regardless what I do, the token has an expiry of 24h. I opened a Kubernetes issue here --> kubernetes/kubernetes#53637.

However, when we create a token after the bootstrap with

kubeadm token create --ttl 0

then it creates a proper token (check with kubeadm token list) so I'm going to use this one for joining the cluster.

rhuss · 2017-10-10T09:55:43Z

@peykens @kerko I updated the playbooks, and also the base system to Hypriot 1.6.0. If you have the possibility, I'd recommend starting from scratch (did it just twice, took me ~ 15 mins each).

The problem with the expiring tokens should be fixed, but for sure I only know it tomorrow ;-)

Please let me know, whether this update works for you.

peykens · 2017-10-10T19:40:40Z

Hi,

I'm missing a file "Could not find or access 'docker.service'" in task TASK [kubernetes : Update docker service startup]

The former docker-1.12.service

Task :

name: Update docker service startup
template: src=docker.service dest=/etc/systemd/system/docker.service

rhuss · 2017-10-11T06:18:00Z

Sorry, forgot to check in (renamed it to remove the version number). Should be back now ...

peykens · 2017-10-11T20:00:54Z

Hi @rhuss ,

First of all thx for your effort.
I have tested the latest scripts and it's going well. I do have to update some issues :

with this docker version I have to update the iptables to allow traffic on cni0 (on all nodes) :
sudo iptables -A FORWARD -i cni0 -j ACCEPT
sudo iptables -A FORWARD -o cni0 -j ACCEPT
I use flannel, but I have to do the install manual because the flannel version used in these scripts doesn't work. I run into the known CrashLoopBackOff
but the main problem I face is when I reboot a worker node, the docker is stuck after reboot :
$ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

rhuss · 2017-10-11T21:17:02Z

thanks for the feedback. tbh, I use weave at the moment (that's what I tested) and don't run into the issues you mentioned. I guess, flannel integration needs some love again (however, I'm happy that one network implementation works smoothely).

I haven't tested a proper reboot yet, but will do asap. Looks like still an issue with the downgrade.

rhuss · 2017-10-11T21:23:56Z

I just found out that it took 12 minutes but the docker came up on the node. For sure not the proper solution, really curious what it is:

journalctl -u docker

journalctl -u docker
-- Logs begin at Wed 2017-10-11 22:17:01 CEST, end at Wed 2017-10-11 23:22:07 CEST. --
Oct 11 23:02:37 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:02:39 n3 dockerd[479]: time="2017-10-11T23:02:39.344410017+02:00" level=warning msg="[!] DON'T BIN
Oct 11 23:02:39 n3 dockerd[479]: listen tcp 192.168.23.203:2375: bind: cannot assign requested address
Oct 11 23:02:39 n3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 11 23:02:39 n3 systemd[1]: Failed to start Docker Application Container Engine.
Oct 11 23:02:39 n3 systemd[1]: Unit docker.service entered failed state.
Oct 11 23:02:41 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:02:41 n3 dockerd[593]: time="2017-10-11T23:02:41.355412517+02:00" level=warning msg="[!] DON'T BIN
Oct 11 23:02:41 n3 dockerd[593]: listen tcp 192.168.23.203:2375: bind: cannot assign requested address
Oct 11 23:02:41 n3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 11 23:02:41 n3 systemd[1]: Failed to start Docker Application Container Engine.
Oct 11 23:02:41 n3 systemd[1]: Unit docker.service entered failed state.
Oct 11 23:02:41 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:02:41 n3 dockerd[617]: time="2017-10-11T23:02:41.743601839+02:00" level=warning msg="[!] DON'T BIN
Oct 11 23:02:41 n3 dockerd[617]: listen tcp 192.168.23.203:2375: bind: cannot assign requested address
Oct 11 23:02:41 n3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 11 23:02:41 n3 systemd[1]: Failed to start Docker Application Container Engine.
Oct 11 23:02:41 n3 systemd[1]: Unit docker.service entered failed state.
Oct 11 23:02:41 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:02:42 n3 dockerd[626]: time="2017-10-11T23:02:42.017502725+02:00" level=warning msg="[!] DON'T BIN
Oct 11 23:02:42 n3 dockerd[626]: listen tcp 192.168.23.203:2375: bind: cannot assign requested address
Oct 11 23:02:42 n3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 11 23:02:42 n3 systemd[1]: Failed to start Docker Application Container Engine.
Oct 11 23:02:42 n3 systemd[1]: Unit docker.service entered failed state.
Oct 11 23:02:42 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:02:42 n3 dockerd[634]: time="2017-10-11T23:02:42.296097100+02:00" level=warning msg="[!] DON'T BIN
Oct 11 23:02:42 n3 dockerd[634]: listen tcp 192.168.23.203:2375: bind: cannot assign requested address
Oct 11 23:02:42 n3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Oct 11 23:02:42 n3 systemd[1]: Failed to start Docker Application Container Engine.
Oct 11 23:02:42 n3 systemd[1]: Unit docker.service entered failed state.
Oct 11 23:02:42 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:02:42 n3 systemd[1]: docker.service start request repeated too quickly, refusing to start.
Oct 11 23:02:42 n3 systemd[1]: Failed to start Docker Application Container Engine.
Oct 11 23:14:50 n3 systemd[1]: Starting Docker Application Container Engine...
Oct 11 23:14:51 n3 dockerd[1213]: time="2017-10-11T23:14:51.130574437+02:00" level=warning msg="[!] DON'T BI
Oct 11 23:14:51 n3 dockerd[1213]: time="2017-10-11T23:14:51.145701193+02:00" level=info msg="libcontainerd:
Oct 11 23:14:52 n3 dockerd[1213]: time="2017-10-11T23:14:52.445588104+02:00" level=warning msg="devmapper: U
Oct 11 23:14:52 n3 dockerd[1213]: time="2017-10-11T23:14:52.586647916+02:00" level=warning msg="devmapper: B
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.121963448+02:00" level=info msg="Graph migration
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.128923901+02:00" level=info msg="Loading contain
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.214439833+02:00" level=warning msg="libcontainer
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.219201942+02:00" level=warning msg="libcontainer
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.237033998+02:00" level=error msg="devmapper: Err
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.237415871+02:00" level=error msg="Error unmounti
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.294828914+02:00" level=warning msg="libcontainer
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.297766376+02:00" level=warning msg="libcontainer
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.553869972+02:00" level=error msg="devmapper: Err
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.556292630+02:00" level=error msg="Error unmounti
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.601649610+02:00" level=warning msg="libcontainer
Oct 11 23:14:53 n3 dockerd[1213]: time="2017-10-11T23:14:53.604438207+02:00" level=warning msg="libcontainer
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.003152430+02:00" level=error msg="devmapper: Err
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.005384464+02:00" level=error msg="Error unmounti
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.046481520+02:00" level=warning msg="libcontainer
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.049681790+02:00" level=warning msg="libcontainer
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.050758787+02:00" level=warning msg="failed to cl
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.401812508+02:00" level=error msg="devmapper: Err
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.402042653+02:00" level=error msg="Error unmounti
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.442660704+02:00" level=warning msg="libcontainer
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.445227293+02:00" level=warning msg="libcontainer
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.445933975+02:00" level=warning msg="failed to cl
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.804666792+02:00" level=error msg="devmapper: Err
Oct 11 23:14:54 n3 dockerd[1213]: time="2017-10-11T23:14:54.807245566+02:00" level=error msg="Error unmounti
Oct 11 23:14:55 n3 dockerd[1213]: time="2017-10-11T23:14:55.510399369+02:00" level=info msg="Firewalld runni
Oct 11 23:14:56 n3 dockerd[1213]: time="2017-10-11T23:14:56.360791769+02:00" level=info msg="Removing stale
Oct 11 23:14:56 n3 dockerd[1213]: time="2017-10-11T23:14:56.946330103+02:00" level=info msg="Removing stale
Oct 11 23:14:57 n3 dockerd[1213]: time="2017-10-11T23:14:57.279952319+02:00" level=info msg="Default bridge
Oct 11 23:14:57 n3 dockerd[1213]: time="2017-10-11T23:14:57.630283024+02:00" level=info msg="Loading contain
Oct 11 23:14:58 n3 dockerd[1213]: time="2017-10-11T23:14:58.024344260+02:00" level=info msg="Daemon has comp
Oct 11 23:14:58 n3 dockerd[1213]: time="2017-10-11T23:14:58.024687090+02:00" level=info msg="Docker daemon"
Oct 11 23:14:58 n3 dockerd[1213]: time="2017-10-11T23:14:58.125131327+02:00" level=info msg="API listen on 1
Oct 11 23:14:58 n3 systemd[1]: Started Docker Application Container Engine.
Oct 11 23:14:58 n3 dockerd[1213]: time="2017-10-11T23:14:58.125377410+02:00" level=info msg="API listen on /
Oct 11 23:15:04 n3 dockerd[1213]: time="2017-10-11T23:15:04.732712370+02:00" level=error msg="Handler for PO
Oct 11 23:15:04 n3 dockerd[1213]: time="2017-10-11T23:15:04.748713557+02:00" level=error msg="Handler for PO
....

rhuss · 2017-10-11T21:25:40Z

Ah, got it. Two service files, and I copied it to the wrong location. Let me fix this.

rhuss · 2017-10-12T08:52:23Z

I indeed copied the wrong docker.service file (sorry, missed to add @kerko 's fix ;-(. Should be fixed now. Actually it shoudl go to /lib as this is the service file installed by apt-get.
The flannel issue is documented here: IPTables rules missing from Flannel/CNI on Kubernetes installation flannel-io/flannel#799 I added the missing iptables rules when flannel is used as CNI plugin now. The role is not really ready yet (had not enough to test), but the task remaining is to save the iptable rules.

Feel free to kick in for the flannel fix, happy about any PR ;-)

rhuss · 2017-10-12T13:52:43Z

Hold on, there are still issues wrt restarts. I think its much better to write a /etc/docker/daemon.json instead of overwriting the service file. I will try that as soon as I'm back from travelling.

rhuss · 2017-10-14T12:48:59Z

Took a bit, but there was an issue with Hypriot 1.6.0, too. Just added a fix for this, so should work now.

@peykens any chance that you can test the updated playbooks ?

peykens · 2017-10-14T15:12:04Z

I will flash all my Pi's with hypriot image 1.6.0 again and start from scratch

peykens · 2017-10-14T17:55:13Z

Hi @rhuss ,

I flashed my 4 Pi's and started from scratch. I just skip the network overlay since I use flannel.
Scripts are running fine, and everything is installed.
Afterwards I can deploy a new service and reach all nodes. Reboot on both master node and worker node is working fine !

So thx a lot for the work.

Unfourtunately, the ingress controller is no longer working on these versions :-(
The pods are not created anymore :
{code}
$ kubectl get deployments -n kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-dns 1 1 1 0 1h
traefik-ingress-controller 3 0 0 0 2m
{code}

Do you also use an ingress controller ?

peykens · 2017-10-14T19:44:16Z

In the mean time I dropped my flannel and restarted with the weave.
Install is fine, but ingress controller hits the same problem. The pod is not created.

peykens · 2017-10-14T20:17:51Z

Ok, got it finally working. Needed to create the ServiceAccount, ClusterRole and ClusterRoleBinding.
See also : hypriot/blog#51

SUPER, now let's wait and see if it keeps on working after 24h (initial token expiry).

Next step, the Kubernetes dashboard. If you have any working links to that, it would be great.

rhuss · 2017-10-15T13:50:24Z

@peykens Traefik integration is pending (I have some initial POC), for the dashboard, including heapster and influxdb you can just call:

ansible-playbook -i hosts management.yml

You then have a kubernetes-dashboard service which you can either export via ingress or via NodePort for testing.

peykens · 2017-10-15T15:09:30Z

OK, missed that one.
Dashboard is running fine, I can access it from every node over the service.
However, my ingress not working yet : 404 Not Found. My other services work fine over ingress. Probably something with the namespace I guess.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kubernetes-dashboard
spec:
  rules:
  - http:
      paths:
      - path: /dashboard	
        backend:
          serviceName: kubernetes-dashboard
          servicePort: 80

ssh tunnel works fine to get access to the dashboard.

rhuss · 2017-10-15T15:15:29Z

Yeah, kubernetes-dashboard is running in namespace kube-system (as the other infra services). I suggest that you install the ingress object also into this namespace.

rhuss · 2017-10-15T15:18:05Z

SUPER, now let's wait and see if it keeps on working after 24h (initial token expiry).

This work for sure as I created an extra token which never expires (you can check with kubeadm token list). See also the upstream issue I opened for why the TTL couldn't initially be provided --> kubernetes/kubernetes#53637)

peykens · 2017-10-15T15:24:24Z

I created the ingress in namespace kube-system, but it doesn't help.

$ kubectl describe ing/kubernetes-dashboard -n kube-system
Name:             kubernetes-dashboard
Namespace:        kube-system
Address:          
Default backend:  default-http-backend:80 (<none>)
Rules:
  Host  Path  Backends
  ----  ----  --------
  *     
        /dashboard   kubernetes-dashboard:80 (10.44.0.2:9090)
Annotations:
Events:  <none>

rhuss · 2017-10-15T15:58:59Z

Last guess: Replace /dashboard with /, at least for me it only worked with using the root context.

Otherwise, I will continue on the traefik ingress controller soon (and also rook as distributed fs), and will adapt the dashboard accordingly.

peykens · 2017-10-15T16:19:12Z

I had another app running behind / therefore I used another path.
I switched now both and guess what ... dashboard is working fine but the other one is no longer accessible :-)
It was working before, so I'll try to figure it out

Thx a lot for your help. Don't know how to thank you.
I'm new to k8s and ansible, so this project has helped me a lot.
I will definitely use it to introduce my colleagues to k8s.

rhuss · 2017-10-15T16:26:04Z

you are welcome ;-). I'm going to close this issue now, feel free to open a new one if you hit some other issues.

vhosakot · 2017-12-18T18:58:24Z

Per kubernetes/kubeadm#335 (comment), kubeadm init --token-ttl 0 resolves the there is no JWS signed token in the cluster-info ConfigMap error for me.

rhuss mentioned this issue Oct 14, 2017

Running docker with TCP port binding fails hypriot/image-builder-rpi#190

Closed

rhuss closed this as completed Oct 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker downgrade fails #28

Docker downgrade fails #28

peykens commented Sep 28, 2017

rhuss commented Oct 2, 2017

kerko commented Oct 2, 2017

rhuss commented Oct 4, 2017

peykens commented Oct 7, 2017

rhuss commented Oct 9, 2017

peykens commented Oct 9, 2017

rhuss commented Oct 10, 2017

rhuss commented Oct 10, 2017

peykens commented Oct 10, 2017 •

edited

Loading

rhuss commented Oct 11, 2017

peykens commented Oct 11, 2017

rhuss commented Oct 11, 2017

rhuss commented Oct 11, 2017

rhuss commented Oct 11, 2017

rhuss commented Oct 12, 2017

rhuss commented Oct 12, 2017

rhuss commented Oct 14, 2017

peykens commented Oct 14, 2017

peykens commented Oct 14, 2017

peykens commented Oct 14, 2017

peykens commented Oct 14, 2017

rhuss commented Oct 15, 2017

peykens commented Oct 15, 2017 •

edited by rhuss

Loading

rhuss commented Oct 15, 2017

rhuss commented Oct 15, 2017 •

edited

Loading

peykens commented Oct 15, 2017 •

edited by rhuss

Loading

rhuss commented Oct 15, 2017

peykens commented Oct 15, 2017

rhuss commented Oct 15, 2017

vhosakot commented Dec 18, 2017

Docker downgrade fails #28

Docker downgrade fails #28

Comments

peykens commented Sep 28, 2017

rhuss commented Oct 2, 2017

kerko commented Oct 2, 2017

rhuss commented Oct 4, 2017

peykens commented Oct 7, 2017

rhuss commented Oct 9, 2017

peykens commented Oct 9, 2017

rhuss commented Oct 10, 2017

rhuss commented Oct 10, 2017

peykens commented Oct 10, 2017 • edited Loading

rhuss commented Oct 11, 2017

peykens commented Oct 11, 2017

rhuss commented Oct 11, 2017

rhuss commented Oct 11, 2017

rhuss commented Oct 11, 2017

rhuss commented Oct 12, 2017

rhuss commented Oct 12, 2017

rhuss commented Oct 14, 2017

peykens commented Oct 14, 2017

peykens commented Oct 14, 2017

peykens commented Oct 14, 2017

peykens commented Oct 14, 2017

rhuss commented Oct 15, 2017

peykens commented Oct 15, 2017 • edited by rhuss Loading

rhuss commented Oct 15, 2017

rhuss commented Oct 15, 2017 • edited Loading

peykens commented Oct 15, 2017 • edited by rhuss Loading

rhuss commented Oct 15, 2017

peykens commented Oct 15, 2017

rhuss commented Oct 15, 2017

vhosakot commented Dec 18, 2017

peykens commented Oct 10, 2017 •

edited

Loading

peykens commented Oct 15, 2017 •

edited by rhuss

Loading

rhuss commented Oct 15, 2017 •

edited

Loading

peykens commented Oct 15, 2017 •

edited by rhuss

Loading