Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

RancherOS not persisting user volume #2188

Closed
zeppelinux opened this issue Dec 22, 2017 · 9 comments
Closed

RancherOS not persisting user volume #2188

zeppelinux opened this issue Dec 22, 2017 · 9 comments

Comments

@zeppelinux
Copy link

zeppelinux commented Dec 22, 2017

RancherOS Version: (ros os version)
1.1.1
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
Baremetal

I reported the issue initially for rke:

rancher/rke#178

But it seems like there is some issue with the RancherOS not persisting the user volumes i.e.

$sudo ros config export

rancher:
  environment:
    EXTRA_CMDLINE: /init
  services:
    user-volumes:
      volumes:
      - /home:/home
      - /opt:/opt
      - /var/lib/kubelet:/var/lib/kubelet
      - /etc/kubernetes:/etc/kubernetes

$ ls -la /etc/kubernetes/
total 12
drwxr-xr-x 3 root root 4096 Dec 21 23:54 .
drwxr-xr-x 1 root root 4096 Dec 21 23:52 ..
drwxr-xr-x 2 root root 4096 Dec 21 23:54 ssl
[rancher@kub2 ~]$

$sudo reboot

/etc/kubernetes is empty

@niusmallnan
Copy link
Contributor

niusmallnan commented Dec 25, 2017

@zeppelinux Can you give more detailed steps?

I tried to do some tests on AWS.

Boot a vm with v1.1.1, then upgrade with v1.1.1, cannot persist /etc/kubernetes

ros os upgrade -i rancher/os:v1.1.1 --append "rancher.services.user-volumes.volumes=[/home:/home,/opt:/opt,/var/lib/kubelet:/var/lib/kubelet,/etc/kubernetes:/etc/kubernetes]"

Boot a vm with v1.1.1, then upgrade with v1.1.2-rc2, can persist /etc/kubernetes

ros os upgrade -i rancher/os:v1.1.2-rc2 --append "rancher.services.user-volumes.volumes=[/home:/home,/opt:/opt,/var/lib/kubelet:/var/lib/kubelet,/etc/kubernetes:/etc/kubernetes]"
  1. If we need a persistent directory to take effect, the os version of upgrade must be different from the current one. This may be a bug.
  2. We should allow users to more easily define user-volumes.

@niusmallnan niusmallnan self-assigned this Dec 25, 2017
@zeppelinux
Copy link
Author

zeppelinux commented Dec 26, 2017

@niusmallnan, here are the steps i did initially:

  1. Instal 1.1.1 on bare metal
  2. add the Ip to cluster.yml
  3. run ./rke up & observe the host is aded to the cluster
  4. reboot host & observe host is not in the cluster anymore (/etc/kubernetes is empty)

I added this between 3 & 4:
ros os upgrade -i rancher/os:v1.1.1 --append "rancher.services.user-volumes.volumes=[/home:/home,/opt:/opt,/var/lib/kubelet:/var/lib/kubelet,/etc/kubernetes:/etc/kubernetes]"

Step 4 produced the same result.

Just tried this:

  1. Instal 1.1.1 on bare metal
  2. add the Ip to cluster.yml
  3. run ./rke up & observe the host is aded to the cluster
  4. ros os upgrade -i rancher/os:v1.1.2-rc2 --append "rancher.services.user-volumes.volumes=[/home:/home,/opt:/opt,/var/lib/kubelet:/var/lib/kubelet,/etc/kubernetes:/etc/kubernetes]"
  5. reboot host & observe host is not in the cluster anymore (/etc/kubernetes is empty)

i.e. /etc/kubernetes is still empty, however /var/lib/kubelet is persisted, in fact it remains after reboot of v1.1.1 as well. So, it looks like there is problem with /etc/kubernetes

Tried to apply rke on the v1.1.2-rc2 but it fails with:
FATA[0001] Failed to set up SSH tunneling for Worker host [192.168.2.220]: Unsupported Docker version found [17.06.2-ce], supported versions are [1.12.6 1.13.1 17.03.2]

Let me know if you need more info.

@niusmallnan
Copy link
Contributor

@zeppelinux Thanks your feedback.
BTW, you can use ros console switch <other-console>. All consoles except the default (busybox) console are persistent.

FATA[0001] Failed to set up SSH tunneling for Worker host [192.168.2.220]: Unsupported Docker version found [17.06.2-ce], supported versions are [1.12.6 1.13.1 17.03.2]

Here, you can use ros engine switch <engine> to switch to the appropriate engine.

I'll try the steps you mentioned.
I hope that when using the default console, the custom persistent directory also takes effect.

@zeppelinux
Copy link
Author

@niusmallnan I was not able to use the rke with 'centos' console:
FATA[0001] Failed to set up SSH tunneling for Worker host [192.168.2.220]: Can't retrieve Docker Info: error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: Failed to dial to Docker socket: ssh: rejected: administratively prohibited (open failed)

But It worked with 'ubuntu' and stayed in the cluster after reboot. Thanks for the workaround!

@niusmallnan
Copy link
Contributor

niusmallnan commented Dec 28, 2017

@zeppelinux
Regarding centos, I know something.
You can check these:
rancher/rke#93
rancher/rke#136

@zeppelinux
Copy link
Author

@niusmallnan thanks for the info.

The bug links you provided look similar to what i'm experiencing with centos console, but I'm not using 'root' user. It's a RancherOS, so only 'rancher' user is available and I'm using it.

I'm having some weird problems with DNS resolution inside of the containers running on the RancherOS host with 'ubuntu' console out of the box. Nothing external is resolved, there is some very long bug report threads related to this issue (for example: kubernetes/kubernetes#23474)
When i revert it (the same host) to the 'default' console - DNS works as expected. So, having default console that can persist /etc/kubernetes volume is way better option than patching ubuntu kubernetes dns configuration :)

@niusmallnan
Copy link
Contributor

niusmallnan commented Dec 29, 2017

@zeppelinux
I think I seem to find the root cause. You can use your custom persistent directory on the default console

Here are my steps:

ros config set rancher.services.user-volumes.volumes  [/home:/home,/opt:/opt,/var/lib/kubelet:/var/lib/kubelet,/etc/kubernetes:/etc/kubernetes]

system-docker rm all-volumes

reboot

Then you can persist /etc/kubernetes volume. I need to rebuild all-volumes when user changed user-volumes.

I have confirmed on AWS, you can check it on your bare metal.

@zeppelinux
Copy link
Author

@niusmallnan it works! I'm glad you nailed it. Cheers!

@kingsd041
Copy link
Contributor

Tested with RancherOS v1.2.0-rc2
Use ros config set set the persistent directory, rancheros the first restart will not save the directory of data.When the second restart, will save the directory data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants