Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

User-docker sucks with docker-17.12.1+ #2300

Closed
kingsd041 opened this issue Mar 21, 2018 · 18 comments
Closed

User-docker sucks with docker-17.12.1+ #2300

kingsd041 opened this issue Mar 21, 2018 · 18 comments
Assignees
Milestone

Comments

@kingsd041
Copy link
Contributor

RancherOS Version: (ros os version)
v.1.3.0-rc1
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
AWS

Switch docker-17.12.1-ce to docker-17.09.1-ce, the docker driver uses vfs by default。

root@ip-172-31-35-98:/var/log# docker info | grep Storage
Storage Driver: vfs
root@ip-172-31-35-98:/var/log# docker -v
Docker version 17.09.1-ce, build 19e2cf6

I found the following error log in docker.log

time="2018-03-21T03:35:05.234106479Z" level=error msg="'overlay2' is not supported over overlayfs"
time="2018-03-21T03:35:05.236377569Z" level=error msg="'overlay' is not supported over overlayfs"
time="2018-03-21T03:35:05.237043516Z" level=error msg="devmapper: Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a dynamic binary to use devicemapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options"
@niusmallnan
Copy link
Contributor

niusmallnan commented Mar 23, 2018

In my testing, the problem may be on docker-17.12.1-ce. If you switch other docker versions there is no problem, such as:
docker-17.09.1-ce ---> docker-xxx-ce(not 17.12.1) ---> docker-17.09.1-ce

If the problem has already occurred, user-docker cannot work. The workaround is to rebuild the console, you can use ros console switch xxx to rebuild it.

@niusmallnan niusmallnan added this to the v1.3.0 milestone Mar 28, 2018
@niusmallnan niusmallnan modified the milestones: v1.3.0, v1.4.0 Apr 9, 2018
@Jason-ZW
Copy link

Jason-ZW commented Apr 12, 2018

Another situation can cause the same prbolem:
docker-17.09.1-ce ---> docker-17.12.1-ce ---> system-docker restart docker

There's something different about docker-17.09:

  • when docker-daemon stopping, it unmounts the directory /var/lib/docker from mounttable

The unmount logic can cause overlay problem in ROS, the directory /var/lib/docker filesystem-type change from ext4 to overlay after unmounts because of console container's default filesystem type is Overlay.

The following backing filesystems are supported by Overlay2 & Overlay driver:

  • ext4
  • xfs

So the errors occur as below:

time="2018-03-21T03:35:05.234106479Z" level=error msg="'overlay2' is not supported over overlayfs"
time="2018-03-21T03:35:05.236377569Z" level=error msg="'overlay' is not supported over overlayfs"
time="2018-03-21T03:35:05.237043516Z" level=error msg="devmapper: Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a dynamic binary to use devicemapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options"

Maybe my understanding is the wrong, please let me know if anyone who has the idea or solution.
A related comment: moby/moby#36833 (comment)

@niusmallnan niusmallnan changed the title Switch docker-17.12.1-ce to docker-17.09.1-ce, the docker driver uses vfs by default User-docker sucks with docker-17.12.1+ Apr 16, 2018
@niusmallnan
Copy link
Contributor

niusmallnan commented Apr 17, 2018

Docker will umount the data root dir caused by this moby/moby#36107 .
This PR has been merged into 17.12.1-ce.
We can see the daemon logs when the daemon is stopping.

.... mountpoint=/var/lib/docker, unmounting daemon root

Look at these code, Docker will umount data root in these scenarios.
https://github.com/docker/docker-ce/blob/17.12/components/engine/daemon/daemon_linux.go#L116-L127

Check our mount info in RancherOS:

36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)

(1) mount ID:  unique identifier of the mount (may be reused after umount)
(2) parent ID:  ID of parent (or of self for the top of the mount tree)
(3) major:minor:  value of st_dev for files on filesystem
(4) root:  root of the mount within the filesystem
(5) mount point:  mount point relative to the process's root
(6) mount options:  per mount options
(7) optional fields:  zero or more fields of the form "tag[:value]"
(8) separator:  marks the end of the optional fields
(9) filesystem type:  name of filesystem of the form "type[.subtype]"
(10) mount source:  filesystem specific information or "none"
(11) super options:  per super block options*/

$ cat /proc/self/mountinfo | grep /var/lib/docker
516 467 202:1 /var/lib/docker /var/lib/docker rw,relatime shared:57 - ext4 /dev/xvda1 rw,data=ordered
....

The root and mount point are the same, so Docker can umount /var/lib/docker.

In general, we have three ways to solve this problem:

  1. Reboot/rebuild the console so that /var/lib/docker can re-mount. Users can do this by restarting Host or switching console.

  2. Make mount root and mount point different. We can update the volumes of container-data-volumes in os-config.yml, perhaps /var/lib/user-docker:/var/lib/docker, then we can get a different mountinfo:

516 467 202:1 /var/lib/user-docker /var/lib/docker rw,relatime shared:57 - ext4 /dev/xvda1 rw,data=ordered
# Docker should not umount the data root(`/var/lib/docker`).
  1. We can do something before Docker daemon starts, such as re-mounting /var/lib/docker.
mkdir /tmp/tmpmount
mount /dev/xvda1 /tmp/tmpmount
mount -o bind /tmp/tmpmount/var/lib/docker /var/lib/docker
umount /tmp/tmpmount

# Then you can check mount info
$ cat /proc/self/mountinfo | grep /var/lib/docker
... /var/lib/docker /var/lib/docker rw,relatime ...

@niusmallnan
Copy link
Contributor

niusmallnan commented May 20, 2018

To fix this issue, I decided to change container-data-volumes in os-config.yml, the user-docker data will save to /var/lib/user-docker.
But this will cause user-docker data loss, especially when users upgrade to v1.4.0 using ros os upgrade.
In order to restore these user-docker data, you can refer to the following:

$ system-docker stop docker

$ system-docker run --rm -it -v /:/host alpine
/ # rm -rf /host/var/lib/user-docker/*
/ # cp -a /host/var/lib/docker/* /host/var/lib/user-docker/

$ system-docker start docker

@thaJeztah
Copy link

This situation should also be addressed by moby/moby#36879

@niusmallnan
Copy link
Contributor

niusmallnan commented May 21, 2018

@thaJeztah Cool, we will test it in next docker-ce stable release.
Expect this PR to be merged. docker-archive/docker-ce#522

@kingsd041
Copy link
Contributor Author

Fixed in rancheros v1.4.0-rc2

@niranjan94
Copy link

@niusmallnan I was able to restore my user-docker data by following the steps mentioned by you. Thank you for that. :)

But now, I realise that there as a lot of space being used since the same data is present on both locations /host/var/lib/docker/ and /host/var/lib/user-docker/. Any straightforward way to prune/remove the user-docker data from /host/var/lib/docker/ ?

@jlelse
Copy link

jlelse commented May 31, 2018

Is it safe to delete the old folder after copying it to user-docker?

@niusmallnan
Copy link
Contributor

Yes, you can delete the old folder.

@pioto
Copy link

pioto commented Jun 1, 2018

@niusmallnan, I think you should also use cp -a instead of cp -rf in the above comment ( #2300 (comment) ), so that permissions are preserved properly.

For example, without this, I found that I could not start up rancher/server again

@prologic
Copy link

Is this still the case?

@stuckj
Copy link

stuckj commented Aug 8, 2018

@prologic, yes, this just happened to me upgrading from 1.0.4 to 1.4.0. @niusmallnan's steps to restore containers / volumes worked for me as well.

@efrecon
Copy link

efrecon commented Aug 15, 2018

I had to restart the machine with the following command to ensure (user-)docker actually finds the old containers, volumes and networks again. I did this instead of the last $ system-docker start docker in the instructions above.

system-docker shutdown -r now

@djmaze
Copy link

djmaze commented Aug 15, 2018 via email

@spikespaz
Copy link

Where is os-config.yml so I can change container-data-volumes as suggested in the fix above?

Also, I updated before reading this and I am applying the fix after the fact. Am I screwed, or can I recover the data this way?

@wywywywy
Copy link

Would a symlink work as well? Just in case I need to roll back.

I'm on 1.3.0 wanting to upgrade to the latest.

@niusmallnan
Copy link
Contributor

@wywywywy Please try this comment #2300 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests