feat: get host uid:gid and use in docker #576

focusaurus · 2019-09-06T19:56:00Z

Impact: major
Type: feature|bugfix

Issue

On linux, the docker container runs as user node (uid 1000), which in many cases is not the same as the developer's host OS uid. This can cause mismatches in ownership of files that are shared between the host and the container via docker volume mounts. Ultimately this surfaces as filesystem errors (EACCESS, permission denied, etc) at various times including when trying to run yarn, do builds, etc.

Why doesn't bug affect all linux users?

On many linux distributions, the first non-system user is assigned uid 1000. That's why in our docker container the node user has uid 1000, it's just a default. That's also why many developer's have uid 1000 as well: it's the default for the first non-system user on their distribution and their user is the first one they add when doing their OS install. Because of that default and coincidence many users avoid this issue because their uid just happens to match the one we use. (This was the case for me personally and why it is harder for me to encounter/reproduce these issues)

Why doesn't this affect macOS users?

Docker for Mac has automatic mapping that avoids this issue largely.

How does this apply to reaction core? Other repos?

If testing proves successful in storefront, I believe this pattern would apply to reaction core and essentially every docker-based service we have that makes use of volume mounts, which is probably most but not all of them. I started with core initially as a fix there would have the widest impact, but the docker setup there is significantly more complex and with the slow startup time I switched to storefront as an easier first project to tackle.

Related issues

Storefront PR attempt from May 2019 (closed unmerged when problem was not solvable without a root entrypoint script)
- WIP: Run docker container using particular uid:gid #533 (review)
many users in the community hitting a version of this issue starting in Oct 2018
- Specifically: Error creating Reaction config file: /opt/reaction/src/.reaction/config.json
- reaction_reaction_1 container exits after a few seconds reaction-development-platform#14
Me hitting a loosely-related issue in storefront
- make destroy cannot delete files owned by root reaction-development-platform#7
community user issue
- Platform Installer Fails in reaction Container reaction#5510
For reference a blog post with basically our technique of an entrypoint script (but we change the node user's uid instead of creating a new user).

Solution

The key elements of the solution are:

Start the docker process as root (so if necessary we can chown volumes to fix them)
Determine the correct uid:gid with stat on the repo root
- I think this will be pretty reliable and less hassle than REACTION_USER
During container startup, modify the node user account in the container to have the matching uid
Do a quick non-recursive stat on each volume mount point and check if ownership is correct.
- If so, proceed with startup
- If not, fix ownership
- This is a performance trade-off that I think is the best we can do here balancing "it always just works" and keeping startup fast
su-exec to node in the container then proceed to launch the application
Provide a ./bin/fix-volumes script that can be run at any time that will chown/chmod all volume mount directories properly and should be a 1-stop fix to this entire category of errors

There are a ton of unix heavy details in here we should scrutinize during code review.

Some things to note about the solution (pending QA testing)

The goal here is to have things "just work" in all cases
File owners and permissions on the host filesystem will be changed when running fix-volumes
- There's potential here for surprise or confusion in the user base, and perhaps "hey don't do that!"
I believe the solution will handle pre-existing files in the volume mount directories with assorted owner/permission combinations and it should force them all to be correct, but I think there's a lot of testing surface here

Breaking changes

I don't think anything here would count as "breaking" but the changed ownership/permission of host files could be surprising/unexpected as noted above.

Testing

Try a fresh git clone of this example-storefront branch, ./bin/setup, and docker-compose up
- Try on mac, linux with userid !=1000, linux with userid=1000
Create some permutations of ownership mismatches and test the fix scripts.
- Directories of interest include
  - $HOME/.cache/yarn-offline-mirror
  - $HOME/.cache/yarn
  - example-storefront/node_modules
  - example-storefront/build
Note you may want to create a new user in your linux host for this to force non-1000 uid. On linux mint I was able to do this via the users & groups GUI (or you can use adduser CLI) and once I did sudo adduser plyons2 docker the new user could use docker. Then I sudo su - plyons2 to get a shell with that user for testing.
Verify when the application finally loads that it is not running as root
- docker-compose run web ps -ef
- You will see some early root processes then a switch to node for the bin/start

Example good output

Your container id may vary
If everything is running as root, that's a bug

docker exec --interactive --tty rc-storefront_web_1 ps -ef
PID   USER     TIME  COMMAND
    1 node      0:00 sh ./bin/start
   81 node      0:00 node /opt/yarn-v1.13.0/bin/yarn.js dev
  102 node      0:46 /usr/local/bin/node ./src/server.js
  113 root      0:00 ps -ef

focusaurus · 2019-09-06T20:14:45Z

I tested on mac and linux and it worked properly in a few basic cases.

rosshadden

How do you feel about using this as a case study for a bit until we're satisfied it works for everyone before we roll out to other projects?

focusaurus · 2019-09-07T00:02:37Z

Sounds pragmatic to me. I'd also like a bit more testing before merge. I'll put a call out on slack. @manueldelreal might you be able to test this branch?

Dockerfile

focusaurus · 2019-09-07T17:16:41Z

That's a good idea. I found myself confused by the mix of seemingly prod-only stuff in there. I'll split them out. For the dev container, I'm leaning toward keeping the container just for the execution environment, but getting the repo root through the volume mount exclusively (not copying the source into the image).

…

On Fri, Sep 6, 2019 at 9:33 PM Eric Dobbertin ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In Dockerfile <#576 (comment)> : > @@ -113,4 +119,6 @@ RUN if [ "$BUILD_ENV" = "production" ]; then \ yarn build; \ fi; -CMD ["yarn", "start"] +# hadolint ignore=DL3002 +USER root This file is serving as dockerfile for both production and dev builds. The things you’re adding shouldn’t be necessary for prod builds. So maybe we should just split into two separate dockerfiles and get rid of the BUILD_ENV stuff? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#576?email_source=notifications&email_token=AADVYSINKNZWDXYYCDF6BPTQIMOHZA5CNFSM4IUMRE72YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCD7RXKY#pullrequestreview-285154219>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADVYSP36XD3ET6T7WQKKFTQIMOHZANCNFSM4IUMRE7Q> .

focusaurus · 2019-09-09T17:33:06Z

OK I started implementing separate Dockerfiles but that got me thinking. I think duplicating the dockerfile will make things worse and the inevitable drift between them is something I want to avoid. I've made Dockerfile changes that ensure fix-volumes.sh is cheap for a production start and I think that's better than having 2 complex Dockerfiles to maintain in almost-sync.

rosshadden · 2019-09-09T19:55:37Z

You can run commands based on environment variables. For example (ripped from some medium post so I wouldn't have to contrive one):

RUN if [ "$NODE_ENV" = "development" ]; \
	then npm install;  \
	else npm install --only=production; \
	fi

aldeed · 2019-09-09T20:34:43Z

@focusaurus Yes there will always be a point at which maintaining the conditional logic correctly is more confusing than splitting to two files. We may not be at that point, but I just thought I'd float the idea.

If you're sticking with one, then you'll need to keep CMD ["yarn", "start"] instead of the entrypoint when it's a prod build. The command is overridden in docker-compose.yml anyway, with command: "/usr/local/src/reaction-app/bin/start". Couldn't you keep the same override paradigm and add your script to bin/start?

I'm not particular about the exact mechanics of how we do it, but I do think the Dockerfile should be as minimal, dependency-free, and production-targeted as possible, since docker-compose.yml can override most things when it's being used for development. Even the existing CMD ["yarn", "start"] could be changed to directly do node ./src/server.js command to eliminate that yarn dep.

One other thing that I've wanted to try to do is to create a single published "development environment" image that most of our docker-compose files can use instead of using the local Dockerfile. That would help people get going faster by eliminating the initial image build time, and would take up less hard drive space. Since all we really need for development is a Node image with some pre-chown'd folders into which we can link host files, why not use the same image everywhere? But that would mean a separate Dockerfile, which I guess is what was in the back of my mind when suggesting that.

If there is a way to solve the USER/chown stuff without image changes, then we actually could use one of the official Node images directly for development. That would be my true goal.

To be clear, what I mean is that in docker-compose.yml, ideally we would change this:

build:
  context: .

to this:

image: node:10-alpine

or at least to this:

image: reactioncommerce/dev:node10v1

At which point Dockerfile needs no conditional logic because it's only for production (maybe also for CI tests).

focusaurus · 2019-09-09T21:11:51Z

If you're sticking with one, then you'll need to keep CMD ["yarn", "start"] instead of the entrypoint when it's a prod build.

entrypoint.sh runs ./bin/start as the default command.

The command is overridden in docker-compose.yml anyway, with command: "/usr/local/src/reaction-app/bin/start". Couldn't you keep the same override paradigm and add your script to bin/start?

I removed the docker-compose override. There's lots of ways this could work. My thought was "create a generic mechanism to solve the docker uid issue that we can eventually templatize to all projects that need volume mounts" so I was trying to keep it logically into a "prepare the mounts" phase prior to a "start the app" phase.

I'm not particular about the exact mechanics of how we do it, but I do think the Dockerfile should be as minimal, dependency-free, and production-targeted as possible, since docker-compose.yml can override most things when it's being used for development. Even the existing CMD ["yarn", "start"] could be changed to directly do node ./src/server.js command to eliminate that yarn dep.

I was initially trying to limit scope of changes in this PR in hoping to get it to land without spending a full cycle on it. There's lots we can change about how we use docker. All I'm focused on in this PR is getting the volume mount permissions correct.

You're other comment about not needing a Dockerfile for local dev I guess might be nice but it's beyond the scope I want to tackle here.

manueldelreal · 2019-09-09T21:49:44Z

@focusaurus tested with fresh installs for users with UID 1001 (my default one), 1000 and an entirely new user, all three scenarios yielded a running container with no permissions issues.

I had a slight hiccup on the first run but it was unrelated to this branch's code changes. I say that this looks good for the scope that you trying to tackle here.

aldeed · 2019-09-10T14:57:46Z

Regarding "beyond the scope", 👍 . We so rarely touch the docker setup that it's tempting to do all of the things that have been building up whenever we do.

focusaurus · 2019-09-10T19:01:35Z

OK bin/start has some dev-only stuff it there so I need to change the default prod command.

focusaurus · 2019-09-10T19:10:25Z

Marking this WIP. We have a lot of stuff that's good and ready to go but we're considering more substantial changes to the prod Dockerfile.

focusaurus · 2019-09-11T22:39:44Z

@manueldelreal Can you pull down the latest changes and run a few more quick tests please?

focusaurus · 2019-09-11T23:09:23Z

Tested latest commits on my mac, all good:

Successfully built a33ce44c4c93
Successfully tagged example-storefront_web:latest
Recreating example-storefront_web_1 ... done
Attaching to example-storefront_web_1
web_1  | Fixing volume ./node_modules (before=0:0 after=501:0)…✓
web_1  | [11:05:44 PM] Compiling server
web_1  | [11:05:50 PM] Compiling client
web_1  | > Using external babel configuration
web_1  | > Location: "/usr/local/src/reaction-app/.babelrc"
web_1  | [11:05:59 PM] Compiled server in 15s
web_1  | [11:06:03 PM] Compiled client in 14s
web_1  |  DONE  Compiled successfully in 19488ms11:06:03 PM
web_1  | 
web_1  | Server started ! ✓
web_1  | 
web_1  |       http://localhost:4000
web_1  |       Press CTRL-C to stop
web_1  |     
web_1  |  WAIT  Compiling...11:06:04 PM
web_1  | 
web_1  | [11:06:05 PM] Compiling client
web_1  | [11:06:05 PM] Compiled client in 542ms
web_1  |  DONE  Compiled successfully in 580ms11:06:05 PM
web_1  |

focusaurus · 2019-09-11T23:37:07Z

Also tested on linux with uid 1001. Looks good. FYI this line of output is the new code:

web_1  | Fixing volume ./node_modules (before=0:0 after=1001:1001)…✓

- Grab the uid:gid of the repo root in docker - Use that for the node user we run as in the container - Check owner of all volume mounts, if not OK, fix them - this should avoid permission errors on linux - provide bin/fix-volumes to fix owner issues ad hoc - There is no more ../node_modules - 1 and only 1 place where modules go for local dev (and prod) - local dev it's a volume mount, prod it's baked in - We have no native add-ons at the moment, so it should be OK, but `npm rebuild` as needed - Also many docker and CI refactorings - split prod and dev dockerfiles - They are both small now - Use ci-scripts/docker-labels to reduce LABEL boilerplate - change lint CI task to run outside a docker container Signed-off-by: Peter Lyons <pete@reactioncommerce.com>

Signed-off-by: Peter Lyons <pete@reactioncommerce.com>

focusaurus · 2019-09-12T13:25:50Z

OK I think I'm good for someone to merge this after the next round of code re-review and testing.

focusaurus · 2019-09-16T15:20:35Z

OK I'm going to merge this. The interesting change is a single commit we can revert later if any serious issues surface.

janus-reith · 2019-11-15T13:50:24Z

Just noticed that this might not work in some case.
On AWS Linux my default USER:GROUP is 1000:1000 as expected.
When fix-volumes.sh runs, it states:
Fixing volume /home/node/.cache/yarn (before=0:0 after=1000:1000)…✓

After that, everything inside the reaction-next-satrterkit belongs to root and can't be edited by my normal user anymore. I wonder what is wrong here.

After removing the existing volumes, and chowning back my folder I can't reproduce it anymore.
Still states that it changes permissons from 0:0 to 1000:1000 but without changing everything to root.

focusaurus · 2019-11-15T22:05:47Z

Hmm. We some a little weirdness today too with some older versions of the docker-base dev images. Could you docker pull reactioncommerce/node-dev:10.16.3-v2 to make sure you have the most recent build of that and then see if you can reproduce your issue? I should be able to track it down if it's doing something weird like that consistently.

focusaurus mentioned this pull request Sep 6, 2019

feat: match userid between host and container #573

Closed

focusaurus force-pushed the feat-docker-uid-match-3 branch 2 times, most recently from e88e579 to 4b8b9fc Compare September 6, 2019 20:04

focusaurus requested a review from rosshadden September 6, 2019 20:15

rosshadden approved these changes Sep 6, 2019

View reviewed changes

aldeed reviewed Sep 7, 2019

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

focusaurus changed the title ~~feat: get host uid:gid and use in docker~~ WIP: feat: get host uid:gid and use in docker Sep 10, 2019

focusaurus requested review from aldeed and rosshadden September 11, 2019 17:06

focusaurus added 2 commits September 12, 2019 07:16

chore: build fresh yarn.lock

124aeaa

Signed-off-by: Peter Lyons <pete@reactioncommerce.com>

focusaurus force-pushed the feat-docker-uid-match-3 branch from 372d586 to 124aeaa Compare September 12, 2019 13:18

focusaurus changed the title ~~WIP: feat: get host uid:gid and use in docker~~ feat: get host uid:gid and use in docker Sep 12, 2019

rosshadden approved these changes Sep 12, 2019

View reviewed changes

focusaurus merged commit f159587 into develop Sep 16, 2019

focusaurus deleted the feat-docker-uid-match-3 branch September 16, 2019 15:20

This was referenced Sep 18, 2019

Release v2.4.0 #582

Merged

chore: bump version to 2.4.0 in package.json #583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: get host uid:gid and use in docker #576

feat: get host uid:gid and use in docker #576

focusaurus commented Sep 6, 2019

focusaurus commented Sep 6, 2019

rosshadden left a comment

focusaurus commented Sep 7, 2019

focusaurus commented Sep 7, 2019 via email

focusaurus commented Sep 9, 2019

rosshadden commented Sep 9, 2019

aldeed commented Sep 9, 2019 •

edited

Loading

focusaurus commented Sep 9, 2019

manueldelreal commented Sep 9, 2019

aldeed commented Sep 10, 2019

focusaurus commented Sep 10, 2019

focusaurus commented Sep 10, 2019

focusaurus commented Sep 11, 2019

focusaurus commented Sep 11, 2019

focusaurus commented Sep 11, 2019

focusaurus commented Sep 12, 2019

focusaurus commented Sep 16, 2019

janus-reith commented Nov 15, 2019 •

edited

Loading

focusaurus commented Nov 15, 2019

feat: get host uid:gid and use in docker #576

feat: get host uid:gid and use in docker #576

Conversation

focusaurus commented Sep 6, 2019

Issue

Related issues

Solution

Breaking changes

Testing

focusaurus commented Sep 6, 2019

rosshadden left a comment

Choose a reason for hiding this comment

focusaurus commented Sep 7, 2019

focusaurus commented Sep 7, 2019 via email

focusaurus commented Sep 9, 2019

rosshadden commented Sep 9, 2019

aldeed commented Sep 9, 2019 • edited Loading

focusaurus commented Sep 9, 2019

manueldelreal commented Sep 9, 2019

aldeed commented Sep 10, 2019

focusaurus commented Sep 10, 2019

focusaurus commented Sep 10, 2019

focusaurus commented Sep 11, 2019

focusaurus commented Sep 11, 2019

focusaurus commented Sep 11, 2019

focusaurus commented Sep 12, 2019

focusaurus commented Sep 16, 2019

janus-reith commented Nov 15, 2019 • edited Loading

focusaurus commented Nov 15, 2019

aldeed commented Sep 9, 2019 •

edited

Loading

janus-reith commented Nov 15, 2019 •

edited

Loading