Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow controlling cache mounts storage location #1512

Open
tonistiigi opened this issue May 27, 2020 · 53 comments
Open

Allow controlling cache mounts storage location #1512

tonistiigi opened this issue May 27, 2020 · 53 comments

Comments

@tonistiigi
Copy link
Member

tonistiigi commented May 27, 2020

related moby/moby#14080

Allowing exporting contents of the type=cache mounts have been asked many times in different repositories and slack.

#1474
docker/buildx#244

Regular remote instruction cache does not work for cache mounts that are not tracked by cache key and are just a location on disk that can be shared by multiple builds.

Currently, the best approach to maintain this cache between nodes is to do it as part of the build. docker/buildx#244 (comment)

I don't think we should try to combine cache mounts with the remote cache backends. Usually, cache mounts are for throwaway cache and restoring it would take a similar time to just recreating it.

What we could do is to allow users to control where the cache location is on disk, in case it is not on top of the snapshots.

We can introduce a cache mount backend concept behind a go interface that different implementation can implement.

Eg. for a Dockerfile like

RUN --mount=type=cache,target=/root/.cache,id=gocache go build ...

you could invoke a build with

docker build --cache-mount id=gocache,type=volume,volume=myvolume .

In that case, the cache could use a Docker volume as a backend. I guess good drivers would be volume in Docker and bind mount from the host for non-docker. If no --cache-mount is specified, the usual snapshotter-based location is used.

From the security perspective, BuildKit API is considered secure by default for the host, so I think this would require daemon side configuration to enable what paths can be bound.

Another complexity is buildx container driver as we can't easily add new mounts to a running container atm. Possible solutions are to force these paths to be set on buildx create or do some hackery with mount propagation.

@myitcv
Copy link

myitcv commented Sep 26, 2020

Just to clarify whether the scope of this proposal covers a use case I have.

Go has content-addressed build and module caches defined via go env GOCACHE and, as of 1.15, go env GOMODCACHE (which was (go env GOPATH)[0]/pkg/mod in previous go versions).

Have read the documentation at https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md, it does not appear possible to delegate a cache to a host directory. Therefore a buildkit Go build and module cache will likely largely duplicate the build and module caches that exists on the host.

However, this proposal seems to be heading in the direction of sharing such a cache:

In that case, the cache could use a Docker volume as a backend

Under this proposal, can I confirm, would it be possible to delegate these caches to host directories?

I note a couple of requirements:

  • in the case of Go, the location of these directories is not guaranteed to be given by an environment variable, rather the output of go env GOCACHE and go env GOMODCACHE is definitive. Whilst not the end of the world if environment variables were the only way of passing values, support the output of go env GOCACHE and go env GOMODCACHE would be even better
  • the UID and GID of writes to the cache should default to be that of the caller

Apologies if this covers old ground (covered elsewhere); I'm rather new to the buildkit approach but happened upon this issue and it sounded like exactly the thing I was after.

Many thanks

@morlay
Copy link
Contributor

morlay commented Sep 27, 2020

hope this feature could land asap.

it will be useful for ci caching. #1673 (comment)

for host directories, could be hacking by

# ensure host path
$ mkdir -p /tmp/gocache

# create volume
$ docker volume create --driver local \
      --opt type=none \
      --opt device=/tmp/gocache \   
      --opt o=bind \
      myvolume 

$ docker run -it --volume myvolume:/go/pkg/mod busybox touch /go/pkg/mod/test.txt
# test.txt will be created under host dir /tmp/gocache/

# maybe work
$ docker buildx build --cache-mount id=gocache,type=volume,volume=myvolume .

@tonistiigi should we control the mount target too?
--cache-mount id=gocache,type=volume,volume=myvolume,target=/go/pkg/mod

@hansbogert
Copy link
Contributor

Does the proposed solution cater to a scenario of a buildserver which uses docker-in-docker? I'm not sure tbh.

@ties-s
Copy link

ties-s commented Mar 4, 2021

Any news on this? This shouldn’t be a really big change, right?

@rhyek
Copy link

rhyek commented Mar 4, 2021

This would solve my life.

@summera
Copy link

summera commented Mar 20, 2021

Would love to see this. Would be a huge win for speeding up builds on CI

@Mahoney
Copy link

Mahoney commented Apr 13, 2021

This would be brilliant for build systems like Gradle and Maven building on e.g. GitHub Actions.

They typically download all their dependencies to a cache dir. It's hard to benefit from layer caching - dependencies can be expressed in multiple files in a nested folder structure, so to avoid a maintenance nightmare it's generally necessary for the Dockerfile to do a COPY . . before running the Gradle / Maven command that downloads the dependencies, which in turn means the layer cache is invalid nearly every time. Downloading the transitive dependencies is very chatty, can be hundreds of HTTP requests, so it's well worth turning into a single tar ball.

I really want to use the same Dockerfile to build locally and on CI, which I think means I don't want to use the strategy suggested in docker/buildx#244 (comment) of loading & exporting the cache to named locations as commands in the Dockerfile - it might work in CI but would be much less efficient building locally, as well as adding a lot of noise to the Dockerfile.

I'm currently caching the whole of the /var/lib/docker dir (!) and restarting the docker service on the GitHub Action runner, which is also pretty slow and expensive, and generally not a great idea!

I'm guessing it wouldn't be a great place for a buildx AND go newbie to start contributing, though...

@Raboo
Copy link

Raboo commented Apr 21, 2021

I would love to see

RUN --mount=type=cache,target=/root/.m2,id=mvncache mvn package

be exported as a own layer/blob in the --cache-to and --cache-from arguments like --cache-to type=registry,ref=myregistry/myapp:cache.

Perhaps even better if it got it's own argument like --cache-id id=mvncache,type=registry,ref=myregistry/dependency-cache:mvn and thus you can share maven dependencies between similar projects and avoiding downloading so much from the Internet.

@strophy
Copy link

strophy commented Jul 2, 2021

I have hacked together a solution that seems to work for including cache mounts in the GitHub actions/cache action. It works by dumping the entire buildkit state dir to a tar file and archiving that in the cache, similar to the approach described here. I think this dump also includes the instruction cache, so this should not be archived separately if using the action.

Because this cache will grow on every run, we use docker buildx prune --force --keep-storage to remove everything but the cache before archiving. You will need to adjust the cache-max-size var to suit your needs, up to GitHub's limit of 5g. It's a dirty hack, and could probably be improved by exporting only the relevant cache mounts, but I am new to both buildkit and GitHub Actions so this is what I came up with. Comments and suggestions for different approaches are welcome.

@speller
Copy link

speller commented Aug 26, 2021

It would be nice to allow setting the from parameter to the build context.

@clemsos
Copy link

clemsos commented Sep 1, 2021

hello, is that feature currently planned or being worked on?

@tonistiigi
Copy link
Member Author

I think the design is accepted. Not being worked on atm.

@chris13524
Copy link

chris13524 commented Sep 20, 2021

Usually, cache mounts are for throwaway cache and restoring it would take a similar time to just recreating it.

@tonistiigi can you clarify what you mean by this? This sounds like it is only considering use cases such as caching dependency installation where downloading the dependencies from a repository would take the same time to download the cache itself. What about situations where regenerating the cache takes significantly longer than importing it (e.g. code building)? The desire is to re-use parts of the cache without erasing it entirely, something that the remote instruction cache does not support.

@tonistiigi
Copy link
Member Author

@chris13524 That's what "usually" means there, "not always". Even with code building cache you might have mixed results. Importing and exporting cache is not free. In addition you need something that can clean up the older cache from your mount or your cache just infinitely grows with every invocation. You can try this today by importing/exporting cache out of the builder into some persistent storage(or additional image) with just an additional built request. docker/buildx#244 (comment)

@jacobwgillespie
Copy link
Contributor

jacobwgillespie commented Jul 3, 2023

I've moved all my docker building to depot.dev

Oh nice, thanks for the mention. ❤️

What we're doing for cache at Depot, if you don't want to use a hosted build service, is running remote BuildKit build machines, storing /var/lib/buildkit on a persistent disk, and re-mounting that same disk for future builds (as well as running both Intel and Arm machines to avoid emulation).

By giving BuildKit a stable disk for cache storage, the cache mounts work as expected, and as a bonus there's no time spent transferring the cache contents to/from S3 or another external content store.

The "downside" to this approach is that multiple builds share the same build host, so cache is not horizontally scalable. But BuildKit has a lot of really cool features that support this kind of architecture, including deduplicating work across concurrent builds, shared locking of cache mounts, etc.

@lvkins
Copy link

lvkins commented Jul 18, 2023

@speller

nice solution, but target build operations like COPY . . will cause to copy host cache directory needlessly.
also, might use busybox image instead which will reduce cache images size.

@dm3ch
Copy link

dm3ch commented Jul 26, 2023

Just wanted to highlight few things:

  • There's some use cases when dependency caching is not only about pulling data from internet. For example: in nodejs there're deps which compilates the source code during installation. So installing such deps without cache could cause up to 20-40 minutes if you are using multi-arch build with emulation of target arch.
  • Exporting/Importing takes some time, but there definitely are use cases, when such mechanism could save more time then it costs. It can be not default behaviour which would allow to not to affect users who doesn't need this.
  • Without garbage collection inside the docker step cache can grow endlessly, but some of package managers (for example pnpm in nodejs ecosystem) have built it garbage collection. So this problem can be solved by package managers. or workarounded by users. Anyway if endless cache is a problem it also applies to local building on dev machine (where cache mounts currently works).

I hope that one day this feature would be implemented. :)

@dm3ch
Copy link

dm3ch commented Jul 26, 2023

By the way in this issue it was proposed to create a volume to be able to mount a host dir as cache target. But maybe it would require less work to just add new cache mode to current caching mechanism that would to just also export/import the cache volumes as part of layers cache?

@anthonyalayo
Copy link

What's the best way to do this in 2024?

@armenr
Copy link

armenr commented Jan 24, 2024

@anthonyalayo - in 2024, this is still the way (that we know) how to do this.

This is a heavily modified version of what @speller provided further up in the comments, I think back in 2022? 😂

In this implementation/script, we are using a single AWS account to store our "build cache" in an S3 bucket, with logical naming of nested folders (aka keys) inside S3.

In this particular script, we pull down a "base" NodeJS docker image that we build in a different pipeline...then, on top of the base NodeJS image, we build our SPA JS app with this script and its accompanying Dockerfile.

For our use-case, we also push a copy of our images into each of our different AWS accounts (yes, I know...there are better ways, but we have "reasons" for this)...so that's why there are tuples of account + account ID that we loop through towards the end of the script.

The key takeaways from the script should be how we mount, process, push, and pull the cache. The rest is just gymnastics and sugar, based on your build requirements/process + your needs around where to push and pull from.

cc: @tsarlewey (sorry I didn't see your question when you tagged me last year!)

#!/bin/bash

##################################################################################################
# Build script for all Dockerfiles in this repository
#
# This script will build all Docker containers defined by the Dockerfiles in this repository.
# It enables Docker BuildKit when building containers and requires access to an S3 "artifact
# repository" in some aws account you have
#
# Dependencies:
#  * docker
#  * aws credentials (should be located at ~/.aws/credentials)
#
# Usage
# $> AWS_PROFILE=<your-aws-profile> ./build.sh
#
##################################################################################################

set -e -o pipefail
export DOCKER_BUILDKIT=1
export BUILDKIT_INLINE_CACHE=1

# Recitals
IMAGE_NAME="${1}"

ACCOUNTS=("some-account,<ACCT_ID>" "some-account,<ACCT_ID>" "some-account,<ACCT_ID>" "some-account,<ACCT_ID>")

BASE_IMAGES_ACCT_NAME=${BASE_IMAGES_ACCT_NAME:-"some-account-name"}
BASE_IMAGES_ACCT_ID=${BASE_IMAGES_ACCT_ID:-"some_account_id"}

BITBUCKET_REPO_SLUG=${BITBUCKET_REPO_SLUG:-"some_repo_name_slug"}
BITBUCKET_BRANCH=${BITBUCKET_BRANCH:-"some_default_branch"}
DOCKER_TAG=${DOCKER_TAG:-"latest"}
DOCKER_PUSH=${DOCKER_PUSH:-"true"}
DOCKER_REPO=${DOCKER_REPO:-"SOME_ID.dkr.ecr.us-west-2.amazonaws.com"}
REPO_CACHE_NAME=${REPO_CACHE_NAME:-"repo-cache"}
S3_BUILDKIT_CACHE=${S3_BUILDKIT_CACHE:-"ops-buildkit-cache"}
MEM_LIMIT=${MEM_LIMIT:-"6144"}
DOCKER_MEM_LIMIT=${DOCKER_MEM_LIMIT:-"6144"}

##################################################################################################
# Functions
##################################################################################################
check_aws_profiles() {
  if [[ -z "${AWS_PROFILE_BUILDER}" ]]; then
    echo "🛑  This script expects the AWS_PROFILE environment variable to be set"
    exit 1
  else
    echo "Found AWS_PROFILE           --> ${AWS_PROFILE_BUILDER}"
  fi
}

fetch_ecr_token() {
  local account_name="${1}"
  local account_id="${2}"

  echo "!! Fetching valid ECR login for ${account_name} with ID: ${account_id} !!"
  aws ecr get-login-password --profile "${account_name}" | docker login --username AWS --password-stdin "${account_id}.dkr.ecr.us-west-2.amazonaws.com"
}

update_git_submodules() {
  echo "!! Updating Submodules !!"
  git submodule update --init --recursive
}

construct_docker_tag() {
  # If not in a branch, get the tag
  local tag
  tag=$(git describe --tags --exact-match 2>/dev/null)

  # If a tag is found, use it as the Docker tag
  if [[ -n "${tag}" ]]; then
    # Replace '.' with '-' and remove any 'v' prefix
    tag=${tag//./-}
    tag=${tag//v/}

    # Grab the short commit hash
    local commit_hash
    commit_hash=$(git rev-parse --short HEAD)

    # Construct the Docker tag
    local docker_tag="${tag}-${commit_hash}"

    # Output the Docker tag
    echo "${docker_tag}"
  else
    # Fallback to the existing logic for branches
    local branch
    branch=$(git symbolic-ref --short HEAD)

    # Replace '/' with '-' and sanitize branch name
    branch=${branch//\//-}
    branch=$(echo "${branch}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')

    # Grab the short commit hash
    local commit_hash
    commit_hash=$(git rev-parse --short HEAD)

    # Construct the Docker tag
    local docker_tag="${branch}-${commit_hash}"

    # Output the Docker tag
    echo "${docker_tag}"
  fi
}

DOCKER_TAG=$(construct_docker_tag)
export DOCKER_TAG

build_to_local() {

  docker build "." \
    --file "Dockerfile" \
    --progress plain \
    --pull \
    --memory "${DOCKER_MEM_LIMIT}" \
    --platform linux/amd64 \
    --cache-from "${DOCKER_REPO}/${IMAGE_NAME}" \
    --build-arg AWS_PROFILE_BUILDER="${AWS_PROFILE_BUILDER}" \
    --build-arg MEM_LIMIT="${MEM_LIMIT}" \
    --secret id=aws-config,src="${HOME}/.aws/config" \
    --secret id=aws-creds,src="${HOME}/.aws/credentials" \
    --tag "${DOCKER_REPO}/${IMAGE_NAME}:${DOCKER_TAG}" \
    --tag "${IMAGE_NAME}:${DOCKER_TAG}" \
    --tag "${IMAGE_NAME}:latest" \
    --tag "${IMAGE_NAME}"

  echo "🙌 BUILT ${image_name}:${2} LOCALLY!"
}

docker_build_tagged_stage() {
  local target_stage
  target_stage="${1}"

  echo "target_stage: ${target_stage}"

  local step_tag
  step_tag="${target_stage}-${DOCKER_TAG}"

  echo "step_tag: ${step_tag}"

  echo "tag 1: ${DOCKER_REPO}/${IMAGE_NAME}:${step_tag}"
  echo "tag 2: ${IMAGE_NAME}:${step_tag}"
  echo "tag 3: ${IMAGE_NAME}:${target_stage}"


  echo "[*] Building ${target_stage} stage with tag ${step_tag}"
  docker build "." \
    --file "Dockerfile" \
    --target "${target_stage}" \
    --platform linux/amd64 \
    --progress plain \
    --memory "${DOCKER_MEM_LIMIT}" \
    --cache-from "${DOCKER_REPO}/${IMAGE_NAME}" \
    --build-arg AWS_PROFILE_BUILDER="${AWS_PROFILE_BUILDER}" \
    --secret id=aws-config,src="${HOME}/.aws/config" \
    --secret id=aws-creds,src="${HOME}/.aws/credentials" \
    --tag "${DOCKER_REPO}/${IMAGE_NAME}:${step_tag}" \
    --tag "${IMAGE_NAME}:${step_tag}" \
    --tag "${IMAGE_NAME}:${target_stage}"
    # --load

  echo "🙌 Built ${IMAGE_NAME}:${step_tag} to LOCAL!"
}

tag_and_push_to_ecr() {
  local image_name="${1}"
  local image_tag="${2}"
  local ecr_repo="${3}.dkr.ecr.us-west-2.amazonaws.com"

  # Tag and push to ECR
  docker tag "${image_name}:${image_tag}" "${ecr_repo}/${image_name}:${image_tag}"
  docker tag "${image_name}:${image_tag}" "${ecr_repo}/${image_name}:latest"
  docker push --quiet "${ecr_repo}/${image_name}:${image_tag}"
  docker push --quiet "${ecr_repo}/${image_name}:latest"

  echo "🙌 Published ${image_name}:${image_tag} to ECR!"
}

# Copy the updated cache back to the host machine
process_cache() {

  ###############################
  # Define local variables
  ###############################

  local build_stage_name
  build_stage_name="${1}"

  local cache_dir
  cache_dir="${2}"

  local cache_container_name
  cache_container_name="${build_stage_name}-container"

  local cache_container_image
  cache_container_image="${IMAGE_NAME}:${build_stage_name}-$DOCKER_TAG"

  ###############################
  # Processing steps
  ###############################

  # remove any existing cache from the host machine
  echo "!! Removing existing local cache !!"
  rm -rf "opt/$cache_dir/*"

  docker_build_tagged_stage "$build_stage_name"

  # remove previous temporary container with the same name if any
  # return true to avoid failing the script if the container does not exist
  docker rm -f "$cache_container_name" || true

  # create a (stopped) temporary container from the tagged image containing the cache.
  docker create -ti --name "$cache_container_name" "$cache_container_image"

  # copy files from the container to the host
  echo "!! Extracting latest cache !!"
  docker cp -L "$cache_container_name:/tmp/$cache_dir" - | tar -x -m -C opt

  # remove the temporary container
  docker rm -f "$cache_container_name"
}

push_to_docker_repo() {
  docker push "${DOCKER_REPO}/${IMAGE_NAME}:${DOCKER_TAG}"
  echo "🙌 Published ${IMAGE_NAME}:${DOCKER_TAG} to ECR!"
}

pull_cache_from_s3() {
  local cache_type
  cache_type="${1}"

  local cache_dir
  cache_dir="${2}"

  echo "!! Creating host-side directory !!"
  mkdir -p "opt/${cache_dir}"

  echo "!! Pulling cache from S3 !!"
  AWS_PROFILE=${AWS_PROFILE_BUILDER} s5cmd --numworkers 512 cp \
    "s3://${S3_BUILDKIT_CACHE}/${BITBUCKET_REPO_SLUG}/${cache_type}/${BITBUCKET_BRANCH}/*" \
    "opt/${cache_dir}/" \
  > /dev/null 2>&1 || true

  echo "!! Done syncing ${cache_type} FROM S3 !!"
}

push_cache_to_s3() {
  local cache_type
  cache_type="${1}"

  local cache_dir
  cache_dir="${2}"

  AWS_PROFILE=${AWS_PROFILE_BUILDER} s5cmd --numworkers 512 cp \
    "opt/${cache_dir}/" \
    "s3://${S3_BUILDKIT_CACHE}/${BITBUCKET_REPO_SLUG}/${cache_type}/${BITBUCKET_BRANCH}/" \
  > /dev/null 2>&1 || true

  echo "!! Done syncing ${cache_type} TO S3 !!"
}


##################################################################################################
# MAIN
##################################################################################################

main() {

###############################
# Build Setup
###############################

  # construct docker tag
  DOCKER_TAG=$(construct_docker_tag)
  echo "🚀 Docker tag: ${DOCKER_TAG}"

  # Check for all required aws profiles
  check_aws_profiles

  # Fetch ECR token from base DEV account for BASE nodeJS image
  fetch_ecr_token "${BASE_IMAGES_ACCT_NAME}" "${BASE_IMAGES_ACCT_ID}"

  # Update git submodules
  echo "!! Updating Submodules !!"
  update_git_submodules

  echo "!! Syncing cache FROM S3 !!"
  pull_cache_from_s3 "${REPO_CACHE_NAME}"   "cache" &
  # pull_cache_from_s3 "ng-cache"             "ng-cache" &
  wait

###############################
# Build Steps
###############################

  echo "🛠️ Building linux/amd64 images for: ${IMAGE_NAME}:${DOCKER_TAG} ..."

  ###############################
  # Prepare Caches
  ###############################

  echo "[*] Preparing caches..."
  docker_build_tagged_stage "spa-cache-prepare"

  ###############################
  # Build Multi-Stage Image
  ###############################

  echo "[*] Executing build"
  build_to_local "${IMAGE_NAME}" "${DOCKER_TAG}"

  ###############################
  # Build & Process Caches
  ###############################

  echo "[*] Building caches..."
  process_cache "spa-cache"     "cache"

  echo "!! Syncing cache TO S3 !!"
  push_cache_to_s3 "${REPO_CACHE_NAME}"   "cache"

  ################################################
  # Push Final Image to All/Multiple AWS Accounts
  ################################################

  for tuple in "${ACCOUNTS[@]}"; do
    IFS=',' read -ra elements <<< "${tuple}"

    account_name="${elements[0]}"
    account_id="${elements[1]}"

    # Fetch ECR token
    fetch_ecr_token "${account_name}" "${account_id}"

    echo "+ Executing ECR tag & push"
    tag_and_push_to_ecr "${IMAGE_NAME}" "${DOCKER_TAG}" "${account_id}"

    echo "🙌 Done Pushing linux/amd64 images for: ${IMAGE_NAME}:${DOCKER_TAG}" to "${account_name}" with account id "${account_id}"
  done

###############################
# Post-Build Steps
###############################
  CACHE_ITEM_COUNT=$(find opt/cache -type f | wc -l)
  CACHE_SIZE=$(du -sh opt/cache | awk '{print $1}')

  echo "[CACHE] Resulting build-time cache contains ${CACHE_ITEM_COUNT} files"
  echo "[CACHE] Total size of all files in extracted cache: ${CACHE_SIZE}"
  echo ""
  echo ""
  echo "🙌 Done building linux/amd64 images for: ${IMAGE_NAME}..."
}

###############################
# Run Main
###############################

main

@anthonyalayo
Copy link

Thanks for the big offering @armenr , we all appreciate it. It's a bit wild that this has been desired for almost 4 years now and we still need to resort to things like this. Does anyone on the thread know a maintainer that can chime in?

@tonistiigi
Copy link
Member Author

Cache-dance repo is linked above and looks pretty simple to use. Even if this was builtin to buildkit, you still need something external that would load and save to your storage, so this proposal doesn't simplify that use case.

@anthonyalayo
Copy link

anthonyalayo commented Jan 25, 2024

@tonistiigi if it was built in to builtkit, a user could specify --cache-from and --cache-to and everything would work? It would emit the cache layers as well as the cache mounts.

@armenr
Copy link

armenr commented Jan 25, 2024

Yeah @tonistiigi - the cache-dance repo looks pretty straightforward too.

I shared our build script as well, since we (sadly) don't have the benfit or luxury of hosting on GitHub and using GitHub actions. Instead, we're stuck in icky BitBucket Pipeline-land, and we needed something that would run on both our local machines as well as in our BitBucket pipelines.

I'm sure ours isn't ideal, but we tried to balance robustness, completeness, and simplicity.

@n1ngu
Copy link

n1ngu commented Jan 25, 2024

if it was built in to builtkit, a user could specify --cache-from and --cache-to and everything would work? It would emit the cache layers as well as the cache mounts.

Downside is that one must be very clever with multistages so that cache is NOT present in final image layers, whereas the cache mounts can be leveraged on single-stage images (and interesting features such as sharing=shared|private|locked)

But yes, I think a workaround with --cache-from and --cache-to should be easier (but I haven't tried such workaround)

@KevinMind
Copy link

GitHub has a very effective cache action that can be used to cache the exported volumes, I agree having this in build kit the way proposed would be life saving for many.

For perspective I'm currently using a local cache type and it works for layer caching but 70-80% of time amd compute on our build is just downloading pip and npm dependencies so without the ability to cache the Mount volumes we get nearly zero of the benefits.

My org won't let us use any GitHub action also so even if there are forks I might not be able to to use them.

Is there a reason why this shouldn't be implemented in buildkit?

lunacd added a commit to lunacd/cps-ci that referenced this issue Mar 24, 2024
** Changes **

Docker does not natively support exporting cache mounts, or saving it to
some external cache storage like the GHA cache.

This commit uses a workaround described in [1] where the content in the
cache mount is moved into a dumb container and then exported through its
file system. To load recovered cache into the cache mount, another dumb
container is built and its build process copies recovered cache into the
cache mount.

This is very much a hack and shouldn't be necessary once docker natively
supported by buildx.

** Justification for not caching more build steps **

dnf, apt, and pip can theoretically also be cached this way. However,
if the list of dependencies have not been changed, the entire docker
layer would be a cache hit and dnf/apt caches would not be beneficial.
Considering that dependency changes should be rare, not caching them
seems wise.

[1] moby/buildkit#1512
@xenomote
Copy link

xenomote commented May 14, 2024

Any progress on this? I have a similar use-case to the first comment, and having the --mount=type=cache included by --cache-to would be a massive help

@adomaskizogian
Copy link

adomaskizogian commented Jun 17, 2024

bump

being able to set cache-to/cache-from location would have an enormous impact overall.
it would help to streamline local dev and ci/cd configs by fully leveraging docker while maintaining quick build times.

@mnahkies
Copy link

I'd also really like to see this.

In case it helps anyone, here's an example of instead using a multi-stage build, and the new COPY --exclude to workaround by including the cache in the build context, then excluding it from the next stage of the image build.

I don't love including my cache in the build context, but it seems to work alright. Before it was possible to COPY --exclude this was quite an unwieldy approach IMO which is why I think it's worth noting

https://gist.github.com/mnahkies/fbc11d747d7b875dcc1bbb304c21c349

@ruffsl
Copy link

ruffsl commented Jul 5, 2024

For my use case, I'd like to share the apt download cache that the buildkit creates when baking an image - with the apt cache I mount to my dev containers, so that I can easily reinstall debians in any dev containers while offline from the internet on my laptop, regardless if I previously downloaded them manually, or downloaded them coincidentally when building invalidated layers from docker's instruction build cache prior. E.g:

# Dockerfile
...
# Edit apt config for caching
RUN mv /etc/apt/apt.conf.d/docker-clean /etc/apt/ && \
    echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
      > /etc/apt/apt.conf.d/keep-cache && \
    # Given apt repo snapshots, just cache apt update once
    apt-get update

# Busted docker instruction build cache here
ARG BUST_CACHE_NONCE

# install bootstrap tools
RUN --mount=type=cache,sharing=locked,target=/var/cache/apt \
    apt-get install -y --no-install-recommends \
      gettext-base \
      python3-rosdep \
      python3-vcstool \
      wget \
      zstd
// .devcontainer/devcontainer.json
...
    "mounts": [
        {
            // Cache apt downloads
            "source": "apt-cache",
            "target": "/var/cache/apt",
            "type": "volume"
        }
    ],

If we could direct buildkit to use a named volume or host bind mount at build time, then caches can be shared.

bra-fsn added a commit to SpareCores/sc-images that referenced this issue Aug 21, 2024
@thompson-shaun thompson-shaun added this to the v0.future milestone Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests