[NM-39] Create minimal docker image #526
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We want to cut down the size of docker images as in move to cloud the start-up time of a model fit is dependent on the time it takes to spin up the container. So spinning up a container will take ~30s to pull the docker image and then it will start and run the model fit. I've played around with a few things here trying to cut this down, with some mixed success.
I tried a build with https://github.com/r-hub/r-minimal but I was seeing significant hit to performance. ~30% slower. I didn't look into the super deeply but I expect it is the musl allocator being slower than glibc.
So that means we need to go with debian based image. First simple thing I tried here was using a multistage build and this is a big improvement, with the rocker/r-ver as the base image.
Current main has sizes like this
After using multi stage build
I then played around with trying to use a slimmer base image for the run-time. I gave it a go with distroless but with not much success getting a slimmer image. I'm sure it will be possible, just quite fiddly and wasn't seeing immediate gains so didn't think it was worthwhile pursing.
I think if we want to make a smaller image from this, two things we could look at
I think 2 might be easier, particularly because the image size is only really really important for model running, if we look at the packages that are installed we can think of them in a few groups
Looking at sizes of image, most of it is R package dependencies, unsurprisingly roughly into several large groups
For actual model fitting we don't need all of this, we could split naomi model into a separate package and build that into a docker image. We could definitely remove markdown stuff, file writing pacakges, api packages. Potentially also duckdb (I think we only need this after calibrating). Potentially also Rcpp things, do we need these after we have already compiled? Perhaps TMB needs them?