Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker images version change do not trigger container update #3639

Closed
lvjp opened this issue Oct 26, 2015 · 18 comments
Closed

Docker images version change do not trigger container update #3639

lvjp opened this issue Oct 26, 2015 · 18 comments

Comments

@lvjp
Copy link
Contributor

lvjp commented Oct 26, 2015

Problem

When i do docker image update i should run it in two step:
1. Update the image
2. Update container associated with the updated image

I think that terraform can be smart enough for handle this in one step.

My environment

My .tf declaration:

provider "docker" {
    host = "unix:/var/run/docker.sock"
}
resource "docker_container" "my-docker-container" {
    name = "my-docker-container"
    image = "${docker_image.my-docker-image.latest}"
}
resource "docker_image" "my-docker-image" {
    name="my/docker-image:0.5.2"
}

Terraform output

My version number

$ terraform version
Terraform v0.6.6

First plan

$ terraform plan
[... refresh cutted ...]
~ docker_image.my-docker-image
    name: "my/docker-image:0.5.2" => "my/docker-image:0.6.2"

First apply

docker_image.my-docker-image: Refreshing state... (ID: 4f744e3faa29693abefe6ef45dddc11fbd15b388b5238b46f1a0828279dc2440my/docker-image:0.5.2)
docker_container.my-docker-container: Refreshing state... (ID: 6b2cd1316e16097aa5c18bc3d3d99d3b249cc0b0a8b6ba47aed338cbc20b706f)
docker_image.my-docker-image: Modifying...
  name: "my/docker-image:0.5.2" => "my/docker-image:0.6.2"
docker_image.my-docker-image: Modifications complete

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
[... Help cutted ...]

Second plan

[... refresh cutted ...]
-/+ docker_container.my-docker-container
    bridge:           "" => "<computed>"
    gateway:          "172.17.42.1" => "<computed>"
    image:            "4f744e3faa29693abefe6ef45dddc11fbd15b388b5238b46f1a0828279dc2440" => "1e39e488bd5b2789b19eb56349702c682430cf507597bfee1dcc81b70955d203" (forces new resource)
    ip_address:       "172.17.0.49" => "<computed>"
    ip_prefix_length: "16" => "<computed>"
    must_run:         "true" => "1"
    name:             "my-docker-container" => "my-docker-container"

Second apply

docker_image.my-docker-image: Refreshing state... (ID: 4f744e3faa29693abefe6ef45dddc11fbd15b388b5238b46f1a0828279dc2440my/docker-image:0.5.2)
docker_container.my-docker-container: Refreshing state... (ID: 6b2cd1316e16097aa5c18bc3d3d99d3b249cc0b0a8b6ba47aed338cbc20b706f)
docker_container.my-docker-container: Destroying...
docker_container.my-docker-container: Destruction complete
docker_container.my-docker-container: Creating...
  bridge:           "" => "<computed>"
  gateway:          "" => "<computed>"
  image:            "" => "1e39e488bd5b2789b19eb56349702c682430cf507597bfee1dcc81b70955d203"
  ip_address:       "" => "<computed>"
  ip_prefix_length: "" => "<computed>"
  must_run:         "" => "1"
  name:             "" => "my-docker-container"
docker_container.my-docker-container: Creation complete

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
[... Help cutted ...]
@kyhavlov
Copy link
Contributor

kyhavlov commented May 6, 2016

This bug is still around. Is this a side effect of how the docker provider manages images? Right now (0.6.15) it also seems to pull images on plan if they aren't there because of the way it's written, which seems like a major bug.

@apparentlymart
Copy link
Contributor

I'm not 100% sure since I don't know much about the internals of the docker provider, but I think this would be addressed by us finishing #6430 and then converting docker_image into a data source.

I think what's going on here is that Terraform can't tell at plan time that changing the docker_image name attribute is going to change the latest attribute; it only finds that out at apply time, when it's too late to realize it needs to replace the container.

If the name attribute were set to ForceNew then this would address the problem in heavyweight way, by making the docker_image resource get "replaced" whenever the name is changed, and thus make Terraform realize that ${docker_image.my-docker-image.latest} would be replaced.

But if we turn it into a data source then that will allow docker_image to update itself during the plan phase (actually, during the "refresh" phase, which happens by default during plan) so that Terraform can see immediately the updated value of ${docker_image.my-docker-image.latest}.

Since this data sources work is likely to land soon (coming in 0.7) I'm going to repurpose this ticket to be the conversion of docker_image into a data source. Once that's done, the following configuration would get the effect you're looking for:

provider "docker" {
    host = "unix:/var/run/docker.sock"
}
resource "docker_container" "my-docker-container" {
    name = "my-docker-container"
    image = "${data.docker_image.my-docker-image.latest}"
}
data "docker_image" "my-docker-image" {
    name = "my/docker-image:0.5.2"
}

@kyhavlov
Copy link
Contributor

kyhavlov commented May 6, 2016

@apparentlymart I'm not sure converting docker_image to a data source is the right way to go here; the pulling of a specific docker image onto a host seems like state of a deployment that terraform should be managing. That said, it looks to me like this bug happens because terraform is calling the same code on read as it is on update: https://github.com/hashicorp/terraform/blob/master/builtin/providers/docker/resource_docker_image_funcs.go#L24-L41

This means it's pulling images/affecting state even during terraform plan; you can observe this by deleting an image from a docker host after terraform has pulled it and then running plan and seeing it re-download the image. It seems like this behavior is counter to how terraform should work. I think this is also the cause of #6431 - terraform calls read() for the resource, attempts to pull the invalid tag and fails because it doesn't exist. Instead, it shouldn't even be able to attempt to pull an invalid tag because read should fail when the image/tag doesn't exist on the registry.

The real fix here that I see would be to make terraform only pull the image on update, and change read to check the registry the image refers to in order to get the SHA stored there for the image/tag. Then, it would know the SHA during plan and would be able to diff that against the image AND container resources instead of just reading the image name and diffing that against the image resource.

If people agree with that, I'd be happy to work on a PR

@apparentlymart
Copy link
Contributor

Oh I see... I should've looked more closely before commenting because I didn't know this resource was actually downloading the image... I was thinking about it like "docker run" where the container creation does the image download (where necessary) as a side-effect.

In which case I think using ForceNew in conjunction with what you suggested seems reasonable. The trick is that Terraform can't handle an in-place update of a computed attribute as you expect because the code dealing with the diff can't tell what changes will be made on Update. By using ForceNew Terraform sees that the whole resource is getting rebuilt and knows to generate a diff.

@kyhavlov
Copy link
Contributor

kyhavlov commented May 6, 2016

I'll try to submit a PR for this soon, then.

@kyhavlov
Copy link
Contributor

kyhavlov commented May 7, 2016

@apparentlymart After trying this out, I see what you mean. The only way terraform will update both the image and container in the same run is by setting ForceNew. In that case though, it recreates the container any time the image name changes, which isn't what I want either (if I'm switching to a new tag that points to the same image hash I don't want to restart).

So it looks like some aspect of the data sources from #6430 are going to be needed here in order to make this case work nicely. Any suggestions on that? I don't see a way to make this work even with data sources other than by adding a new data source alongside image just to read the hash, which seems kind of clunky.

It's also pretty strange to me that updating a computed field during plan/read doesn't have any effect on dependent resources, to inform their diffs during plan.

@apparentlymart
Copy link
Contributor

The way I was originally conceiving it was that there'd be a data source to find containers by criteria without downloading them, and then the docker_container resource would then automatically download the given image if it isn't already available locally.

So in my earlier example, the docker_image data source would just find the hash of the image that matches the name but would not download it. Then the docker_container resource would take that hash and check if it's already available locally. If it isn't then it will download it. Either way, it'll boot up a container with that image.

I think this gets you what you want, because the docker_image data source will be able to refresh during plan, meaning that the new hash will immediately be available in ${data.docker_image.my-docker-image.latest}.

I'm planning to do a similar thing with AWS AMIs, where you'd be able to find one and then spin a new EC2 instance only when the matched AMI changes.

@kyhavlov
Copy link
Contributor

kyhavlov commented May 7, 2016

That would work for fixing this issue but then it turns the container resource into a container + image resource that just manages both. That seems kind of weird in the terraform model, but it would work.

The other option is keeping the current image resource pretty much as is and just making it rely on a data source to get the SHA, so that it and the container resource can be updated correctly on changes.

@kyhavlov
Copy link
Contributor

kyhavlov commented May 7, 2016

@apparentlymart Would it be possible to make ForceNew work for computed fields, so that reading an update to latest would trigger a recreation of the image (and thus, the container)? Or is that impossible for the same reason you're adding data sources.

@apparentlymart
Copy link
Contributor

This is definitely an interesting case. I'm going to address your first comment first here. 😀

I'm definitely not a Docker expert (I've only used it in dev/CI systems so far, and not in production) so please do correct my assumptions here if they seem off...

In my mental model of Docker I've tended to think of the local image store as just a "cache", and thus local images not as resources in their own right but just an implementation detail of booting up an image that "lives" in a remote repository.

The docker_image resource already has some weird behavior (as far as I can tell from the source code) because of this cache-like behavior:

  • On Read of a pre-existing docker_image instance, if there is a newer version of the same named image then it will pull the new version but leave the old one in the cache.
  • However, on Delete (unless keep_locally is set) it will try to delete whichever image happens to be current at the time... even if that image were also being "managed" by a separate Terraform configuration or being used by some container that was created outside of Terraform.

I'm not really convinced that Terraform is really "managing an image" in any useful sense of that term. In one case it "leaks" images and loses track of them, and in the other case it tries to exert exclusive ownership of a possibly-shared resource.

So with all of that said, I would agree that there are really three different operations going on here that we're trying to model:

  • Given an image "name" (tag/version/etc), find the current hash for that name, either locally or in a remote repository. (Ideally, refresh this on each run so we can see if a new version is available.)
  • Given an image hash, make sure it's in the local image cache of the docker host. If it isn't, download it from a remote repository.
  • Start a container using a particular image hash.

The first of these being a data source seems like a no-brainer to me, since that would address the concern of having it notice when a new image version is available.

The part I'm less sure about is whether the second of these is worth modelling as a distinct resource or whether it should be combined into one of the others.

The local image cache is a content-addressable set, so the "create" and "delete" operations here really mean "add to cache" (docker pull) and "remove from cache" (docker rmi). Thus docker_image being a resource in its own right feels weird to me; if anything, I'd expect a docker_image resource to map onto either the docker build or docker import operations.

Thus it felt right to me for the second operation to be a side-effect of the third, thus making the docker_container resource Create behave like the docker run command. Of course in that case a separate process outside of Terraform must be responsible for "garbage collection" of unused images, but AFAICT that is the usual way to use docker when Terraform is not in play.

Again, please feel free to correct me if any of my reasoning above seems flawed when considering practical use of Docker!

@apparentlymart
Copy link
Contributor

apparentlymart commented May 7, 2016

Now considering the question of putting ForceNew on a Computed resource: this combination of schema attributes is tricky because one aspect of "Computed" is that it gets excluded when diffing because it's not under the configuration's control and thus there's nothing to diff against (unless it's also Optional, in which case things are more complicated.)

Since ForceNew is an attribute that is considered only when diffing, and Computed causes an item to be ignored when diffing, this combination doesn't make sense.


I think there is a potential tweak to Terraform's behavior that would get the behavior you want: we could change the Apply implementation in helper/schema so that a "replace" diff (a +/-) reduces to a regular "update" if, after final interpolation, Terraform recognizes that the ForceNew attributes didn't change after all.

This would be the first example of Terraform doing something different than what the plan specified, but perhaps it's okay as long as the plan is always "more conservative" than what actually happens. That's a philosophical question that I'd want to reflect on more before forming an opinion. 😀

@kyhavlov
Copy link
Contributor

kyhavlov commented May 7, 2016

I think that's a fair point about image management. I don't see a good way for terraform to clean up old images in any implementation; it can't delete the old image immediately after downloading the new one because the container might still be running (and docker won't even allow it to be deleted if that's the case). So if it's going to leak images to the cache anyway, we might as well roll the image pull logic into the container resource and treat it as a docker run would, as you said.

If I understand right, we need the functionality of data sources to get terraform to turn an image name/tag into a SHA to diff against the existing container resource in the plan step. The only way to have that happen right now would be to enter the SHA itself into the container resource, which we obviously don't want to do.

What's the working status of #6430 at the moment? Are data sources done enough for me to try out converting docker_image into one?

@apparentlymart
Copy link
Contributor

I'm working on the data sources stuff right now, actually. As of this comment the branch is in a state where it would in theory be possible to develop a docker_image data source using the terraform_remote_state one as an example. However, I'm probably going to rebase the branch a few more times along the way so it might be easier to wait a little until the first set of work has been reviewed and merged into the dev-0.7 branch. It will hopefully stabilize the next couple weeks, either by me working on it over weekends or by someone at Hashicorp taking the reins and finishing it up.

@apparentlymart
Copy link
Contributor

@kyhavlov... FYI, the data sources work has landed in master now. I put the work we discussed here in the rollup ticket #6688 for existing resources we might want to convert to data sources.

@rhencke
Copy link

rhencke commented May 18, 2016

To work around this issue, you can do something like this:

variable "docker_image_name" {
    type = "string"
    default = "my/docker_image:0.5.2"
}
resource "docker_container" "my_docker_container" {
    name = "my_docker_container"
    image = "${data.docker_image.my_docker_image.latest}"
    env = [ "forceupdate=${var.docker_image_name}" ]
}
data "docker_image" "my_docker_image" {
    name = "${var.docker_image_name)"
}

It's not a perfect solution, but with this method any change to the name of the Docker image forces the Docker container to update.

@kyhavlov
Copy link
Contributor

kyhavlov commented May 31, 2016

@apparentlymart
Mostly finished with the code for this, I ended up not adding the image pull logic to the container resource. There's a couple fields in docker_image now that care about image lifecycle (keep_updated and keep_locally) and I've got a PR open in docker/swarm to pass in constraints on where to pull the image, so it seemed easier to keep it in its own resource.

What I've done so far lets you write this:

data "docker_image" "nomad" {
  name = "kyhavlov/nomad:latest"
}

resource "docker_image" "nomad" {
  name = "${data.docker_image.nomad.name}"
  registry_id = "${data.docker_image.nomad.id}"
}

resource "docker_container" "nomad-server" {
  image        = "${docker_image.nomad.latest}"
  name         = "nomad-server"
}

I've fixed the docker_image read() behavior to just read the image id on the host instead of trying to actually pull an image, so if you have a manifest like the one above and you update the image in the registry, plan will correctly show that the image and container both need recreating in the same run.

I should be able to have the PR up within a day or two.

@kyhavlov
Copy link
Contributor

kyhavlov commented Jun 4, 2016

PR is up for review: #7000

@ghost
Copy link

ghost commented Apr 22, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants