Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behavior for resource creation and replacement #3220

Open
jimmycuadra opened this issue Sep 11, 2015 · 6 comments
Open

Different behavior for resource creation and replacement #3220

jimmycuadra opened this issue Sep 11, 2015 · 6 comments

Comments

@jimmycuadra
Copy link
Contributor

I'm using Terraform to create an etcd cluster using static bootstrapping (meaning the IPs of the servers are known in advance and provided to each etcd node as CLI arguments). One of the parameters you need to pass to etcd in this scenario is -initial-cluster-state which must be one of new or existing. Specifically, from etcd's docs:

Initial cluster state ("new" or "existing"). Set to new for all members present during initial static or DNS bootstrapping. If this option is set to existing, etcd will attempt to join the existing cluster. If the wrong value is set, etcd will attempt to start but fail safely.

The problem this introduces for Terraform is that there isn't a good mechanism for knowing whether an arbitrary server is part of the initial cluster, being added later to increase the cluster size, or taking the place of an existing server that failed. In the first case the value of the flag must be new and in the latter two it must be existing.

I attempted to work around this by putting the text "new" in a file and using the file function in Terraform to read in the value on the initial run. Then a local provisioner overwrites the contents of the file with "existing" so that future runs will use that value instead. This doesn't quite work because Terraform is smart enough to see that the contents of the file has changed, which makes it think that all etcd servers need to be recreated on the next run (because all this configuration is being supplied to the server with cloud-config/user-data via a Terraform template resource).

I want to treat this value as state rather than part of the Terraform configuration itself, because the configuration is in a module (so it can be applied to different environments, e.g. staging, production). This means that the code in the module must stay agnostic to whether or not the initial cluster of servers has been created or not for any given environment.

It might also be relevant info to mention that the etcd servers are defined with the aws_instance resource with a count parameter defined, so increasing the count from, say, 3 to 5 would need to leave the existing servers alone and use the new value for the two new ones.

Basically I need a way to say "if this is the initial creation of X, do Y, else do Z." If this is possible today in some way I'm not aware of, I'd love to hear about it. If not, I'm wondering how Terraform might be improved to handle this situation in a general way without something too specific to my particular use case. Conditional logic as discussed in #1604 combined with some way to check for the presence of a file and use the result as a truth value is one possibility.

@apparentlymart
Copy link
Contributor

One way to hack around this would be to do the initial cluster setup in a null_resource that sits alongside the resources for the etcd servers:

resource "null_resource" "etcd_bootstrap" {
    provisioner "remote-exec" {
        inline = [
            "something-to-bootstrap-the-cluster"
        ]
    }
}

Since provisioners only run when a resource is initially created, and a null_resource never gets recreated unless you explicitly remove or taint it, you can use the provisioner to do something that should only happen once on the first run of a Terraform config.

In making this suggestion I'm expecting that you can make the etcd servers come up by default as "existing", let them fail safely because the value is wrong, and then correct it to "new" in the provisioner. On subsequent runs the hosts will just boot up as "existing" and should work as expected, assuming I'm understanding the flow correctly. (I don't know etcd well, so quite possibly I'm not understanding.)

I do something somewhat like this to bring up a Consul cluster, although the dynamic there works a bit differently so I expect your case will be different in the details.

@jimmycuadra
Copy link
Contributor Author

I didn't know there was a null resource! Does this appear anywhere in the documentation? I see the code has existed for about a year.

Thanks for your idea. I will think about it and see if there's a way I can use this to solve my issue.

@apparentlymart
Copy link
Contributor

Jimmy,
It's not currently in the docs (I believe it was originally there just for use in tests) but there was a discussion earlier in the week in #580 about whether it's time to document it, given that several patterns have emerged around it.

@discordianfish
Copy link

I have the same requirement, also trying to deploy etcd. Wondering if @jimmycuadra found a solution because with just the null resource it's not working for me:

@apparentlymart suggests to start the cluster in 'existing' state, then use the null resource provisioner to switch it to 'new' for the initial bootstrapping.
The problem with this is approach is that after replacing an instance in a existing cluster, we need to wait for the cluster to become healthy again. In this case state = existing. But we can't wait everytime state = existing because that's also the initial state before the null resource provisioner switched it to new...

@jimmycuadra
Copy link
Contributor Author

I never found a good workaround for the approach I was taking before. The project I needed this for has kind of stalled so I haven't actually tried the new setup yet, but the new plan is just to treat each etcd server as an explict, hardcoded resource, each with an individually controllable Terraform variable for the initial cluster state. Basically:

resource "template_file" "etcd_01_cloud_config" {
  filename = "${path.module}/templates/etcd_cloud_config.yml"

  vars {
    initial_cluster_state = "${var.etcd_01_initial_cluster_state}"
    name = "etcd_01"
  }
}

resource "template_file" "etcd_02_cloud_config" {
  filename = "${path.module}/templates/etcd_cloud_config.yml"

  vars {
    initial_cluster_state = "${var.etcd_02_initial_cluster_state}"
    name = "etcd_02"
  }
}

resource "template_file" "etcd_03_cloud_config" {
  filename = "${path.module}/templates/etcd_cloud_config.yml"

  vars {
    initial_cluster_state = "${var.etcd_03_initial_cluster_state}"
    name = "etcd_03"
  }
}

It's verbose and not very flexible in terms of scaling the size of the etcd cluster, but for our purposes and given the current constraints of Terraform, I think it will work.

@discordianfish
Copy link

Thanks for sharing! But this won't wait for quorum when it replaces an existing instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants