Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to delete docker image with multiple tags on same id #15760

Closed
BDelacour opened this issue Jan 11, 2023 · 2 comments · Fixed by #15962
Closed

unable to delete docker image with multiple tags on same id #15760

BDelacour opened this issue Jan 11, 2023 · 2 comments · Fixed by #15962
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/driver/docker type/bug
Milestone

Comments

@BDelacour
Copy link

Nomad version

Nomad v1.4.3 (f464aca)

Operating system and Environment details

Debian Bullseye (11)

Issue

Hello !

When a job is deployed with a different image tag pointing to the same digest (two tags, one image), Docker driver is unable to delete the previous image later on.

Ex :

$> docker images
REPOSITORY  TAG        IMAGE ID       CREATED              SIZE
project1    0a8b653c   26b79d6079da   About a minute ago   924MB
project2    0a8b653c   13f41a2b8456   7 minutes ago        465MB
project2    530817fd   13f41a2b8456   7 minutes ago        465MB
project1    530817fd   21cccefce913   12 minutes ago       924MB
project1    7e991dbe   21cccefce913   12 minutes ago       924M
$> docker ps -a
CONTAINER ID   IMAGE               COMMAND                  CREATED          STATUS          PORTS                      NAMES
45756fcc7b18   project2:0a8b653c   "docker-entrypoint.s…"   34 minutes ago   Up 34 minutes   3000/tcp, 3000/udp         project2-3beab3d0-7124-385b-d0f2-015f0b575ba4
0016185272ec   project1:0a8b653c   "/entrypoint.sh /usr…"   34 minutes ago   Up 34 minutes   9000/tcp, 80/tcp, 80/udp   project1-ec844e41-4303-ffb6-572b-e77c5786c770

Nomad logs shows

[DEBUG] client.driver_mgr.docker: image id reference count decremented: driver=docker image_id=sha256:21cccefce913c5ead81f429c28810c3f7b487c6558a622a1bf1bbdcad7b1efad references=0
[DEBUG] client.driver_mgr.docker: unable to cleanup image, still in use: driver=docker image_id=sha256:21cccefce913c5ead81f429c28810c3f7b487c6558a622a1bf1bbdcad7b1efad

Image 21cccefce913 is no longer in use but because we're trying to delete using the image ID, we encounter a conflict here :

if derr, ok := err.(*docker.Error); ok && derr.Status == 409 {
d.logger.Debug("unable to cleanup image, still in use", "image_id", id)
return
}

But 409 only means conflict (https://docs.docker.com/engine/api/v1.23/#remove-an-image)

Reproduction steps

Deploying a simple job with Alpine alpine:3.16 then alpine:3.16.3 then alpine:3.17 (at the moment I write this issue) should create the same bug as 3.16 and 3.16.3 point to the same digest b2774aff8c30

Expected Result

The image is deleted even if two tags are on the same digest.

Actual Result

The client node keeps growing as this bug is reproduced a lot with some monorepo projects.
To successfully delete my images, I must keep a cron with

docker image prune -af

Thank you !

@BDelacour
Copy link
Author

We may use RemoveImageExtended with force: true https://github.com/fsouza/go-dockerclient/blob/580a8bc1e73380a2e05395870716ff86fb2a3e64/image.go#L185-L197 and trust the Nomad internal reference count.

Docker API doc for force parameter says

Remove the image even if it is being used by stopped containers or has other tags

See : https://docs.docker.com/engine/api/v1.41/#tag/Image/operation/ImageDelete

@shoenig
Copy link
Member

shoenig commented Jan 30, 2023

Thanks for the report @BDelacour! Indeed I can reproduce with this simple job

job "bug" {
  type = "sysbatch"
  datacenters = ["dc1"]

  group "group1" {
    task "hello" {
      driver = "docker"

      config {
        image          = "alpine:3.16"
        # image        = "alpine:3.16.3"
        # image        = "alpine:3.17"
        command = "echo"
        args = ["hi"]
        auth_soft_fail = true
      }
    }
  }
}

With this agent config (+ -dev mode)

client {
  enabled = true
}

server {
  enabled = true
}

log_level = "DEBUG"

plugin "docker" {
  config {
    gc {
      image = true
      image_delay = "1m"
      container = true
    }
  }
}

Your idea of specifying force on cleanup sounds reasonable - since it still only applies to stopped containers.

I'll try to slip in a fix for this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/driver/docker type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants