Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

zfs datasets won't get mounted after reboot #2256

Closed
shawly opened this issue Feb 14, 2018 · 5 comments
Closed

zfs datasets won't get mounted after reboot #2256

shawly opened this issue Feb 14, 2018 · 5 comments

Comments

@shawly
Copy link

shawly commented Feb 14, 2018

RancherOS Version: (ros os version)
1.2.0

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
baremetal

cloud-config.yml

rancher:
  docker:
    engine: docker-17.12.0-ce
    graph: /mnt/space/docker
    storage_driver: zfs
  services_include:
    kernel-headers-system-docker: true
    rancher-server-stable: true
    zfs: true

So I've been trying to set my user-dockers root to a zfs dataset, just like described within the RancherOS docs.
The problem is, after a reboot, it seems like docker gets started before the zfs service, so it tries to create some files under /mnt/space/docker before the zfs datasets are mounted, resulting in an error that the filesystem is wrong. And since docker already created files under /mnt/space/docker, I can't mount the datasets unless I remove the existing files from the mountpoint.

Does anyone know a fix for this issue? I looked at the service config of the zfs service here https://github.com/rancher/os-services/blob/master/z/zfs.yml but it seems the label io.rancher.os.before: "docker" is correctly set, so why does the docker container still create files before all zfs datasets are mounted?

Edit Alright after several reboots I've found out that it actually works kinda, if I just wait for docker to be started. But if I execute docker info immediately after rebooting, it just unmounts all zfs datasets and still tries to start docker unter /mnt/space/docker.

@niusmallnan
Copy link
Contributor

The current situation is that the user-docker does not wait for zfs to finish mounting. I will consider enhancing the logic here.
We can check the zfs mounting status before user-docker starts, if user-docker uses the zfs storage driver. Make sure we can get zfs datasets, then start user-docker.

@niusmallnan
Copy link
Contributor

@Jason-ZW I didn't find a good way to reproduce it, there is a complicated way. You can try ros os upgrade, this can make zfs unavailable. Then docker will automatically create files in the graph directory, but they are not zfs filesystem.

When you manually enable zfs again, you will see the docker logs:

$ tail -f /var/log/docker.log
time="2019-02-15T07:23:51.195350597Z" level=error msg="No zfs dataset found for root" backingFS=extfs root=/mnt/zpool1/docker storage-driver=zfs
Error starting daemon: error initializing graphdriver: prerequisites for driver not satisfied (wrong filesystem?)

This is because we have lost the zfs mounting:

[root@ip-172-31-3-240 rancher]# zfs list -o name,mountpoint,mounted
NAME           MOUNTPOINT          MOUNTED
zpool1         /mnt/zpool1              no
zpool1/docker  /mnt/zpool1/docker       no

And zfs mount can not work:

[root@ip-172-31-3-240 rancher]# zfs mount -a
cannot mount '/mnt/zpool1': directory is not empty
cannot mount '/mnt/zpool1/docker': directory is not empty

This can be enabled after doing this:

[root@ip-172-31-3-240 rancher]# rm -rf /mn/zpool1/  # zfs pool dir
[root@ip-172-31-3-240 rancher]# zfs mount -a

[root@ip-172-31-3-240 rancher]# zfs list -o name,mountpoint,mounted
NAME           MOUNTPOINT          MOUNTED
zpool1         /mnt/zpool1             yes
zpool1/docker  /mnt/zpool1/docker      yes

Then docker can work again.

@Jason-ZW
Copy link

Jason-ZW commented Feb 28, 2019

I cannot reproduce this issue in either 1.5.1 or 1.2.0. Try almost 20 times, everything is fine. I think this could be a small probability event.

cloud-config.yml

[root@rancher rancher]# ros c export
EXTRA_CMDLINE: /init
rancher:
  docker:
    graph: /mnt/zpool1/docker
    storage_driver: zfs
  environment:
    EXTRA_CMDLINE: /init
  password: rancher
  services_include:
    kernel-headers-system-docker: true
    zfs: true
  state:
    dev: LABEL=RANCHER_STATE
    wait: true
ssh_authorized_keys: []

After reboot

[root@rancher rancher]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@rancher rancher]# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.09.1-ce
Storage Driver: zfs
 Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
 Zpool Health: not available
 Parent Dataset: zpool1/docker
 Space Used By Parent: 144896
 Space Available: 8256233472
 Parent Quota: no
 Compression: off
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.78-rancher2
Operating System: RancherOS v1.2.0
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.955GiB
Name: rancher
ID: SRBL:SJLL:NFGY:JXEA:GVVA:LTAF:HKBG:BPJL:DJDS:FA5O:WCSE:AFJD
Docker Root Dir: /mnt/zpool1/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

@nathanweeks
Copy link

@Jason-ZW : I've seen this issue consistently during Rancher OS upgrades (see #2256 (comment))

@rootwuj
Copy link

rootwuj commented May 27, 2019

Tested with rancher/os:v1.5.2-rc1 from May 27
Verified fixed.

I reproduced this issue on versions 1.5.1 and 1.2.0
Steps:

  1. Use zfs storage-driver by document.
  2. Disable zfs.
  3. Upgrade rancheros.
  4. Manual enable and up zfs.

I can see the same error message with this comment.

Test:
According to the above reproduction steps, test on 1.5.2-rc1.

Result:
ZFS filesystem is normally mounted. No error log in /var/log/docker.log.
You can see the docker logs:

# system-docker logs -f docker
time="2019-05-28T02:09:35Z" level=fatal msg="BackingFS: /mnt/zpool1/docker not match storage-driver: zfs"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants