Docker driver removing container image on task failure #8552

stevenscg · 2020-07-28T18:08:28Z

Nomad version

Nomad v0.12.0 (8f7fbc8)

Operating system and Environment details

CentOS 7.8.2003

Issue

As of v0.12.0, container images are incorrectly removed from the host when a task fails and docker.cleanup.image is set to false.

If the task starts, runs and then exits properly, this issue does not appear to occur.

The same jobs did not appear to exhibit this behavior on any of the prior releases. Version 0.11.3 was the most recent used prior to 0.12.0.

As shown in the config example, we have also recently tried quoting the client options keys and values per the documentation without any apparent change in the behavior described by this issue. We typically just quote the values in client options.

The use case for retaining the container images is a development environment where the same image is used for several days at a time. The setup scripts for this environment take care of building the container images, so users of this environment see jobs that fail to start randomly now that would have continued to work with past versions of Nomad.

Reproduction steps

Config file:

client {
  enabled = true
  options {
    driver.whitelist = "docker"
    "docker.cleanup.image" = "false"
  }
}

plugin "docker" {
  config {
    volumes {
      enabled = true
    }
  }
}

We are not certain what kind of task failure causes this behavior at this time. A simple task that exits with code 1 or similar may be sufficient.

The text was updated successfully, but these errors were encountered:

notnoop · 2020-07-28T19:43:33Z

Thank you for reporting the issue. We'll investigate and follow up!

FWIW, between 0.9.0 and 0.11.2, Nomad wasn't GCing any images! We fixed image GC in #7947 in 0.11.2 - so I'm little surprised you didn't encounter this in 0.11.3 - so we'll need to dig in further.

stevenscg · 2020-07-28T19:46:02Z

@notnoop Noted. Thanks. I believe we were running 0.11.3 most recently because we tend to keep very current for this development environment. However, it could have been 0.11.2.

stevenscg · 2020-08-20T17:03:44Z

I believe that this issue is still present on v0.12.3.

stevenscg · 2020-08-25T18:17:08Z

@notnoop Is there any known or potential workaround for this kind of issue that I could try in the interim?

stevenscg · 2020-08-26T14:02:50Z

I spent some time with versions 0.11.4 and 0.11.2 on my test instance where this problem was occurring. These versions have the same behavior as 0.12.3, so I'm not entirely sure what's happening.

As a workaround, I tried using a long cleanup image delay which would be fine for my use case. It did not seem to make any difference in the errant behavior.

client {
  enabled = true
  ....
  options {
    "docker.auth.config" = "/etc/docker/config.json"
    "driver.whitelist" = "docker"
    "docker.cleanup.image" = "true"
    "docker.cleanup.image.delay" = "30m"
  }
}

docker version
Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:46:54 2020
 OS/Arch:           linux/amd64
 Experimental:      false

consul version
Consul v1.8.3
Revision a9322b9c7
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

stevenscg · 2020-09-15T16:44:38Z

Still seeing this on v0.12.4.

notnoop · 2020-09-15T18:30:52Z

Hi @stevenscg, I'm very very sorry for taking a very long to investigate this. The issue seems to be mixing the old deprecated syntax with the new one. If plugin "docker" { config { ... } } is present in the config, the docker driver ignores the old client options fields. As you pointed out in the work around, once you use the options exclusively, the options are interpreted as expected.

I'd suggest adopting the new plugin config syntax with config like the following:

client {
  enabled = true
  options {
    driver.whitelist       = "docker"
  }
}

plugin "docker" {
  config {
    volumes {
      enabled = true
    }

    gc {
      image = false
    }
  }
}

Let me know if that addresses the issue!

stevenscg · 2020-09-15T18:46:01Z

@notnoop Thanks for the info! I had moved to the plugin syntax for "volumes" but not yet "gc", so I think this will fix my particular issue. I'll drop back in a few days and close it if it all checks out.

stevenscg · 2020-09-16T12:27:42Z

This all looks good, thanks! Closing.

github-actions · 2022-11-02T02:42:05Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

notnoop added type/bug theme/driver/docker labels Jul 28, 2020

stevenscg closed this as completed Sep 16, 2020

github-actions bot locked as resolved and limited conversation to collaborators Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker driver removing container image on task failure #8552

Docker driver removing container image on task failure #8552

stevenscg commented Jul 28, 2020

notnoop commented Jul 28, 2020

stevenscg commented Jul 28, 2020

stevenscg commented Aug 20, 2020

stevenscg commented Aug 25, 2020

stevenscg commented Aug 26, 2020

stevenscg commented Sep 15, 2020

notnoop commented Sep 15, 2020

stevenscg commented Sep 15, 2020

stevenscg commented Sep 16, 2020

github-actions bot commented Nov 2, 2022

Docker driver removing container image on task failure #8552

Docker driver removing container image on task failure #8552

Comments

stevenscg commented Jul 28, 2020

Nomad version

Operating system and Environment details

Issue

Reproduction steps

notnoop commented Jul 28, 2020

stevenscg commented Jul 28, 2020

stevenscg commented Aug 20, 2020

stevenscg commented Aug 25, 2020

stevenscg commented Aug 26, 2020

stevenscg commented Sep 15, 2020

notnoop commented Sep 15, 2020

stevenscg commented Sep 15, 2020

stevenscg commented Sep 16, 2020

github-actions bot commented Nov 2, 2022