Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor docker image and CI pipeline #1624

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

keeler
Copy link
Contributor

@keeler keeler commented Jun 1, 2022

Summary

Refactors the docker image for better layer caching and refactors the github action CI pipeline to run tests and linting in a docker container. The CI checks now run in ~2m instead of ~7m, and the size of the docker image is about 70% what it was before (796MB --> 564MB). The steps in the CI pipeline are encapsulated in a Makefile so that it's easier to reproduce those steps locally.

Details

  • Restructures the Dockerfile to install dependencies first, then download card data, then add sources and do a webpack build. Previously, installing dependencies + downloading card data + doing webpack build happened at the very end so any time a source file changed the dependencies would be re-installed and card info would be re-downloaded even if they didn't need to, dramatically slowing down repeated iterations of docker builds.
  • Refactors config/version.js to prefer using a VERSION_INFO env var that gets built into the docker image. This is so the git command and .git folder don't need to be built into the docker image which improves layer caching and saves some space in the image and docker build context.
  • Moves several files from backend/ to backend/core/. These files are required by the scripts in scripts/ and were moved so that backend/core/ could be added to the docker image before the rest of the backend code. This makes it so if the backend code outside of backend/core/ changes, the downloader scripts do not need to be re-run.

Notes for Future Improvements

Stuff that was out of scope for the effort in this PR but would perhaps be worth doing.

  • Use multi-stage docker build. Make a stage for building & running tests and a 'production' stage that copies built artifacts out of the 'builder' stage and only installs prod dependencies and whatever else is the bare minimum to run the app. This would likely further reduce the image size. Maybe not by much, though, as the downloaded card data accounts for ~350MB of the image.

@keeler keeler marked this pull request as ready for review June 1, 2022 05:48
node_modules
frontend/lib
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed frontend/lib and data/scores.json because they don't appear to exist anymore.

Added ignores for .git and .github because they really shouldn't be built into the docker image anyway. In particular, having .git built into the image means that making a commit required rebuilding the image from the COPY . . step that existed previously, even if that commit changed no code (e.g. a README change) which is unnecessary.

@keeler keeler marked this pull request as draft June 2, 2022 05:25
@@ -1,4 +1,11 @@
data/*
!data/scores.json
**/.git
Copy link
Contributor Author

@keeler keeler Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the .git folder was added to the docker image for the sake of building in a version number, namely a commit hash to display in the footer. Now that version number is built in as an environment variable. The other ignores here leave out files that are irrelevant to the docker image build. Any time these files changed required the entire image to be rebuilt.

with:
node-version: ${{matrix.node_version}}
fetch-depth: 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting fetch-depth: 0 makes it so git describe --tags works in the make docker step.


# Install "git"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing git was apparently done only to allow running git describe --tags in config/version.js. This is slow and bloats the size of the image, and is IMHO unnecessary just to get a commit hash in the footer.


# Set working dir as /app
WORKDIR /app

# Add sources to /app
COPY . .
RUN adduser -S dr4ftuser
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node images already include a non-root user called node, so adding another one is redundant. https://github.com/nodejs/docker-node/blob/main/docs/BestPractices.md#non-root-user


# Set working dir as /app
WORKDIR /app

# Add sources to /app
COPY . .
Copy link
Contributor Author

@keeler keeler Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running COPY . . is not a good idea IMHO. When building a docker image instructions like RUN, COPY, etc. will be cached as a 'layer' in the image. When running COPY . . any file changing invalidates the cache for this layer, meaning this and everything below it needs to re-run.

# Add sources to /app
COPY . .
RUN adduser -S dr4ftuser
RUN chown dr4ftuser -R .
Copy link
Contributor Author

@keeler keeler Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running chown in a separate RUN instruction like this has a sneaky behavior where it actually copies files in layers above it, increasing the size of the image. The COPY instruction includes a --chown flag that does not trigger this behavior, hence why it's used all over the place here.

COPY --chown=node scripts/ scripts/
RUN npm run download_allsets \
&& npm run download_booster_rules \
&& chown node -R data/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including this chown in the same RUN instruction does not increase the size of the image like it would if it were in its own separate RUN instruction.


# Install the dependencies
RUN npm ci
Copy link
Contributor Author

@keeler keeler Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running npm ci at the end in combination with the COPY . . instruction above it meant that any time any source file changed, the dependencies would need to get reinstalled and the card info would be re-downloaded again regardless of whether it was necessary. This happens because a change in any source file (or really any file) invalidates the layer cache of the COPY . . layer. This slowed down repeated iterations of docker builds significantly.

Now, the dependencies and card info are added to the image before most of this code. You can test the difference this makes by running make docker to build the image, changing a file in frontend/ in some innocuous way (like adding a comment), then running make docker again. You'll see that the build picks up from the COPY frontend/ layer, skipping all the layers above it that don't need to change, including the dependency installation and card info download.

&& npm run download_booster_rules \
&& chown node -R data/

# Add sources to /app
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to order these in such a way that more frequently-changing files would be copied in later layers so the layer cache invalidation would only apply to lower layers. Open to suggestions.

Also, I think the app.json file could be excluded here.

@keeler keeler marked this pull request as ready for review June 19, 2022 04:54
@tooomm
Copy link
Contributor

tooomm commented Jun 20, 2022

That's a bunch of changes again, nice!

Looks like you combined everything in one docker run.
That also means we no longer test against various Node versions to ensure compatibility and spot issues.

The CI checks now run in ~2m instead of ~7m

I'm looking at improving the CI runs since a while in #1506 and they finish in 1min: https://github.com/dr4fters/dr4ft/actions/runs/2530256044
Total time is ~2min as they run in parallel.
But they do test against all actively maintained and available node versions, including the given one in the .nvmrc file.

What do you think about the node test matrix?
Nothing set in stone here. But maybe its worth combining those?

Refactors the docker image for better layer caching and (...) the size of the docker image is about 70% what it was before (796MB --> 564MB).

The docker improvements and the reworked layering and caching sounds useful when running the app in a container.
But I can't judge too much about that.

@keeler
Copy link
Contributor Author

keeler commented Jun 20, 2022

I'm looking at improving the CI runs since a while in #1506 and they finish in 1min: https://github.com/dr4fters/dr4ft/actions/runs/2530256044 Total time is ~2min as they run in parallel. But they do test against all actively maintained and available node versions, including the given one in the .nvmrc file.

What do you think about the node test matrix? Nothing set in stone here. But maybe its worth combining those?

Nice, I wasn't aware of that PR, thank you for sharing.

I don't understand the need to test against different node versions. Is it because developers might be using an older version of node? Wouldn't establishing and clearly documenting the specific node version to use be much easier than testing and supporting a bunch of different node versions? Ultimately, the app will run using a single node version (current LTS) and that's the one worth testing, and IMHO developers should have no expectation that a different major version of node will necessarily work. Am I being too uncompromising?

Mainly, I replaced the CI steps with docker because it's much easier to reproduce the entire CI pipeline locally when/if there are issues. Also, I assume https://dr4ft.info runs from a docker container? If so, to me that is more reason to build & run everything in the CI pipeline from docker.

The docker improvements and the reworked layering and caching sounds useful when running the app in a container.

The main benefits of the changes in this PR are:

  1. Can repro CI pipeline locally.
  2. Faster to iteratively re-build the docker image while developing or addressing CI issues.
  3. Smaller image, so faster to push/download and lighter on disk space.

If you feel strongly about testing different node versions, it would be possible to combine our ideas (test multiple node versions + run everything from docker) by using different base images in the docker build, e.g. docker build --build-arg BASE_IMAGE=${{base-image-from-matrix}} with a Dockerfile like this:

ARG BASE_IMAGE=node:lts-alpine

FROM $BASE_IMAGE
...

@tooomm
Copy link
Contributor

tooomm commented Jun 20, 2022

I don't think Zach runs dr4ft.info from a docker container. Generally everything here was not centred around docker due to lack of knowledge. The Dockerfile is only added to the CI to have at least some basic commands run and see if it works at all (and keeps working).

Isn't it common practice to test against various Node versions in CI?
It's not only about devs, but also people running the app. They might have a different version or other services with different requirements on the same machine. Maybe I'm wrong?
Docker should not be mandatory for sure, but an option.

But that are things devs working with the code and actual web and docker ninjas need to answer.

@ZeldaZach
Copy link
Member

With a docker container, we only need to theoretically test against the version the Docker container supports. I'd consider switching over to a container system for the main deployment once it's stable. Thanks for the CR, if it's ready for review please let me know!

@keeler
Copy link
Contributor Author

keeler commented Jun 21, 2022

Isn't it common practice to test against various Node versions in CI?

For libraries/packages intended for use by other projects, definitely yes. For web apps like this in my experience you pick a node version and test only that version.

It's not only about devs, but also people running the app. They might have a different version or other services with different requirements on the same machine.

Doesn't nvm solve that problem already? Wouldn't publishing an 'official' container image solve that problem as well? Have users asked for support across different node versions? Or is this an assumption?

Docker should not be mandatory for sure, but an option.

I think we can agree on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants