Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NM-39] Create minimal docker image #526

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

[NM-39] Create minimal docker image #526

wants to merge 2 commits into from

Conversation

r-ash
Copy link
Collaborator

@r-ash r-ash commented Sep 16, 2024

We want to cut down the size of docker images as in move to cloud the start-up time of a model fit is dependent on the time it takes to spin up the container. So spinning up a container will take ~30s to pull the docker image and then it will start and run the model fit. I've played around with a few things here trying to cut this down, with some mixed success.

I tried a build with https://github.com/r-hub/r-minimal but I was seeing significant hit to performance. ~30% slower. I didn't look into the super deeply but I expect it is the musl allocator being slower than glibc.

So that means we need to go with debian based image. First simple thing I tried here was using a multistage build and this is a big improvement, with the rocker/r-ver as the base image.

Current main has sizes like this

  • Compressed local - 2.3G
  • Compressed remote - 2.31G
  • Uncompressed - 5.9G

After using multi stage build

  • compressed local - 619M

I then played around with trying to use a slimmer base image for the run-time. I gave it a go with distroless but with not much success getting a slimmer image. I'm sure it will be possible, just quite fiddly and wasn't seeing immediate gains so didn't think it was worthwhile pursing.

I think if we want to make a smaller image from this, two things we could look at

  1. Use the 2nd part of the multistage build to start from a slimmer image (debian-slim?, distroless) and only copy over the things we need a run time. I think it is quite challenging to work out only exactly what we need.
  2. Reduce the number of packages and dependencies we need

I think 2 might be easier, particularly because the image size is only really really important for model running, if we look at the packages that are installed we can think of them in a few groups
Looking at sizes of image, most of it is R package dependencies, unsurprisingly roughly into several large groups

  1. Boost headers (what is requiring this - this is used by eppasm, first90 and qs)
  2. Markdown stuff (RMarkdown, pandoc, knitr, bslib etc)
  3. duckdb
  4. Rcpp (Rcpp, RcppEigen, rcppParallel)
  5. Shape file stuff (gdal, sp, geojsonio, geojsonsf, dpdep)
  6. Other file writing things (openxlsx, data.table)
  7. API stuff (hintr, V8, lgr)

For actual model fitting we don't need all of this, we could split naomi model into a separate package and build that into a docker image. We could definitely remove markdown stuff, file writing pacakges, api packages. Potentially also duckdb (I think we only need this after calibrating). Potentially also Rcpp things, do we need these after we have already compiled? Perhaps TMB needs them?

@r-ash r-ash changed the title [nm-39] Create minimal docker image [NM-39] Create minimal docker image Sep 16, 2024
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant