Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker volume /lost+found folder is incompatible with popular images that are potentially hard to reconfigure #5777

Closed
corrieb opened this issue Jul 20, 2017 · 16 comments
Labels
component/portlayer/storage kind/defect Behavior that is inconsistent with what's intended priority/p2

Comments

@corrieb
Copy link
Contributor

corrieb commented Jul 20, 2017

User Statement:

As a container developer, I expect that popular images such as postgres, mysql, redis and many others will work with VIC volumes without modification.

Details:

We've had a bug open for this issue before. It was #2929. The assertion in the bug was that this is likely only a problem for postgres and that there was a reasonable workaround. As such, I think at the time it seemed expedient to continue the incompatibility.

What's been clear to me though working with scenario-based testing recently is that this problem is far more widespread than just postgres. Take mysql as an example. It took me an hour trawling through documentation to figure out how to configure mysql to write its data to a subdirectory of the mounted volume. And this was just to work around a VIC incompatibility.

Here's an example. If you want a named volume with mysql with docker, you use:

docker run -v myvol:/var/lib/mysql mysql:5.7

This doesn't work with VIC. Instead, you have to specify:

docker run -v myvol:/var/lib/mysql mysql:5.7 --datadir=/var/lib/mysql/data

There is no environment variable to achieve this.

It's one thing to say to a customer, "you just have to specify a subdirectory", but if they then have to spend time figuring out how to reconfigure the container they're running... that's a very bad UX.

To be fair, this will be the case with any ext4 disk-based volume implementation. I'm quite sure it's true of the vSphere Docker Volume Service. I'm going to make the case though that we need to do better.

Acceptance Criteria:

The alternative is that we should take the top N most popular docker images and document how to work around this problem for each of them. I don't want that job :)

I think we can pull of this subdirectory trick ourselves by automatically creating a /data dir in every mounted volume and creating a symbolic link to it. Deleting /lost+found is not the right answer.

@corrieb corrieb added component/portlayer/storage kind/defect Behavior that is inconsistent with what's intended priority/p0 labels Jul 20, 2017
@corrieb
Copy link
Contributor Author

corrieb commented Jul 20, 2017

It's arguable whether this is a "must have" for 1.2, but it's definitely worth striving for IMHO

@corrieb
Copy link
Contributor Author

corrieb commented Jul 21, 2017

I configured a mysql container with the volume mounted to /.tether/volumes/<id> with a /data subdirectory. Then created a symbolic link from /var/lib/mysql -> /.tether/volumes/<id>/data. This worked fine. If we were to do this when we mount the volume, we could get the desired transparency.

@corrieb
Copy link
Contributor Author

corrieb commented Jul 21, 2017

Spoke to @hickeng about this solution. He was broadly supportive of this as the least bad solution to the problem.

"No real downside for vmdks beyond confusion/inconsistency if using the volume via non-vic means. It's basically enshrining a work-around a common database misconfiguration. I'm not sure if it would cause issues for nfs/smb/others - but definitely not treating them the same way would be confusing."

"We will need to add a version of some kind to the volume metadata so we know if we're mounting a subdir or the raw. We'll also need to support RDM/raw extent style operations where we don't have that control"

@jialin-li jialin-li self-assigned this Aug 2, 2017
@jialin-li jialin-li added this to the Sprint 14 milestone Aug 2, 2017
@sflxn
Copy link
Contributor

sflxn commented Aug 2, 2017

We've known about this for quite awhile. I wrote up a bug on this last year, but Faiyez believed there was nothing we could do about it and closed the issue. I believe this is one of the issue that prevents many containers from running on VIC, straight from docker hub. Either that or everyone downloads the same popular containers to do initial test of VIC and notices they don't run. Faiyez said the OS was adding the lost+found folder.

We found the workaround for mysql and postgres last year. I thought there was an environment variable that will fix it. If we go the route of workarounds, we should address how to get the information to customers. If it's hidden in some docs or our repo or in github issues, it won't help customers. For info on the workaround for mysql, mariadb, and postgress, you can search through our closed github issues. I don't know where else this information resides.

@hickeng
Copy link
Member

hickeng commented Aug 7, 2017

https://github.com/hickeng/vic/tree/5777 prototypes a possible fix for this but would need rebasing on top of https://github.com/hickeng/vic/tree/untangle for basic hygine purposes with shared constants.

@ghost
Copy link

ghost commented Aug 14, 2017

@hickeng Note: keep in mind the Tether overrided archive functions for Online CP do a regular mount if copying from a volume. If we're adding a data directory to the volumes then Online Copy needs to consider that during it's mount operation. We need to either:

  • A) use a common function for Online Copy and volume attaching on startup. Perhaps ops_linux.go:MountLabel(), but that's currently bind mounting.
  • B) Adjust the Online CP mount function MountDiskLabel() to look for a data folder in the volume. If it exists, return that as the root for the copy operation instead of the disk root.
  • C) Avoid doing a separate mount for Online CP and use the path that already exists on the container VM. Not sure how we would do this or we would've done it that way from the beginning.

@jialin-li
Copy link

^I am working on making online copy code to use the ops_linux.go:MountLabel() instead and the bind mount shouldn't be an issue since the label is mounted before bind mount happens.

@jialin-li jialin-li reopened this Aug 14, 2017
@jialin-li
Copy link

A problem with MountLabel() is that once a container starts, the bindTarget is created and mounted. So another call to the MountLabel() will cause the mount to fail because bindTarget is already a mount point. Current thought is to try removing the bindTarget first, if it succeeds or fails with not exist, we can assume that it's not a mount point and mount it. Otherwise, we assume bindTarget is already mounted and we can just use it without remounting.

@jialin-li
Copy link

Updated @hickeng's prototype https://github.com/jialin-li/vic/tree/5777 to address the problems mentioned above. The latest commit passes online cp tests, George's prototype supports running all the following images:

docker -H $HOST --tls run -d -v v1:/var/lib/mysql --env="MYSQL_ROOT_PASSWORD=mypassword" mysql

docker -H $HOST --tls run -d -e MYSQL_ROOT_PASSWORD=mypass mariadb

docker -H $HOST --tls run -d postgres

docker -H $HOST --tls run -d redis

docker -H $HOST --tls run -d mongo

Haven't verify operations on those containers work, but the all the containers are up and running.

@jialin-li
Copy link

jialin-li commented Aug 16, 2017

Update: I've tested mysql, mariadb, postgres and they all work fine, so the containers that create volumes and expecting an empty directory seem to function properly.

Things that user should be aware of when exposing the container:
Our current bridge network doesn't support publishing port, so to connect to the container we need to create it with container network and publish port. We also don't support port forwarding.

@mdubya66
Copy link
Contributor

@corrieb @pdaigle I'm pulling this issue from 1.2. The change required is too large to take at this time. We should release note the workarounds for the various databases. adding kind/note. @jialin-li will be adding those workarounds.

@mdubya66 mdubya66 added the impact/doc/note Requires creation of or changes to an official release note label Aug 17, 2017
@mdubya66 mdubya66 removed this from the Sprint 15 milestone Aug 17, 2017
@jialin-li
Copy link

Workarounds

Postgres:
Set environment variable PGDATA to an empty directory inside a volume.
The empty directory doesn't need to exist.
docker run -d -e PGDATA=/var/lib/postgresql/data/data postgres

MySQL:
Set datadir to a directory inside the volume. Directory data doesn't need to exist.
docker run -d -v myvol:/var/lib/mysql mysql --datadir=/var/lib/mysql/data

Note:
MariaDB, redis and mongo work fine with named volumes.

@anchal-agrawal
Copy link
Contributor

@jialin-li we've got some workarounds in #3857 that are already doc'd.

@corrieb
Copy link
Contributor Author

corrieb commented Aug 23, 2017

@jialin-li I don't understand your comment about port-forwarding above. VIC doesn't support exposing ports from a Dockerfile using -P, but it does support port forwarding on bridge networks using -p. These images should not need a container network. Please can you clarify.

@mdubya66 This seems to have been merged after all. Was there a change of heart?

@jialin-li
Copy link

@corrieb When I tested the containers running those images, I published certain ports to receive client requests, and I thought our current bridge network doesn't support port forwarding or exposing ports (@willsherwood can add to this) so I had to create container networks in order to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/portlayer/storage kind/defect Behavior that is inconsistent with what's intended priority/p2
Projects
None yet
Development

No branches or pull requests

7 participants