Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lifecycle: "at least one of x should exist" #1821

Closed
JeanMertz opened this issue May 6, 2015 · 6 comments
Closed

lifecycle: "at least one of x should exist" #1821

JeanMertz opened this issue May 6, 2015 · 6 comments

Comments

@JeanMertz
Copy link
Contributor

Is there a way to guarantee that at least one of a specific resource exists (or
even better, to have plan/apply only run on a subset of a single resource)?

The current lifecycle implementation tries to help with this, but the moment a
machine is "up" is not the moment it is actually "ready for work". I could add
some null resource and/or local provisioners to wait for the machine to become
"ready", but to me this will probably create more problems than it solves.

Use-case:

Three etcd cluster machines. If a change is made to the user-data, the machines
would be recreated, but if all of them are removed, the cluster is destroyed,
and the reserved discovery ID is no longer valid, requiring a complete re-
creation of the etcd cluster with a new discovery ID.

I'd like to be able to say "only ever recreate one of these for each plan/apply
run". That way we can more easily dose the recreate cycle.

Any thoughts on this?

@knuckolls
Copy link
Contributor

If you could do an apply with a parallelism of one and have a pre-destroy provisioner that could block until something happens, would that be helpful for this use case? If so, that's what I'm currently working on and PR's to core will be incoming soon. Talked to @phinze about the implementation last week.

@JeanMertz
Copy link
Contributor Author

@knuckolls that would certainly help. A simple check to see how many etcd units are active would suffice to know if a machine is allowed to go down or not. Of course, this means losing parallelism, which is a big plus for Terraform.

I still like the idea of "Allow max of x of same resource to be recreated in a single run" option to exist. This would continue to allow parallelism, but of course it's up to the user to then run the plan/apply commands multiple times to get a complete up-to-date environment.

I guess I could use the -target option to recreate them one by one for now.

@knuckolls
Copy link
Contributor

Agreed, "Allow max of x of same resource to be recreated in a single run" would be nice. For systems like HDFS with a replication factor the trick would be to not delete more than the given replication factor at a time so that you don't lose blocks. There are additional complexities surrounding rack awareness. The "decomission one node at a time and check to see that the cluster is healthy before starting the next delete" seemed like the safest baseline functionality for rolling stateful clusters for now.

@JeanMertz
Copy link
Contributor Author

@knuckolls maybe it would make more sense to actually have this be a "keep minimum of x of same resource alive", that way clusters with a minimum required size can be defined, thus when scaling up, more machines can be converted on the initial run.

So instead of having to do three runs after scaling a setup from 3 to 7 nodes with "allow max" set to 2, we'd only need two runs with "min alive" set to 1.

Also, this bit me again today, when I accidentally applied a plan which recreated all machines in the cluster, losing the etc2 cluster in the pre-configured discovery key (this was on a non-production environment, but I can see this happen sometime at some point in the future by someone 😄)

@teamterraform
Copy link
Contributor

Hi @JeanMertz! Sorry for the long silence here.

Over the years we've seen a number of permutations of this sort of constraint, and it seems that there isn't really a single Terraform feature that would cover all of them, but yet a separate feature for each one would lead to a lot of extra complexity that helps only a small cohort of users.

With that said, we took a different approach in Terraform 0.12 and implemented a command that will produce a JSON representation of a saved plan intended for consumption by other software:

terraform plan -out=tfplan
terraform show -json tfplan

The format of this output is the "Plan Representation" documented in JSON Output Format. You can use this as part of production Terraform automation to implement any additional policy and safety checks you need in your environment, in a programming language of your choice.

For the use-case you gave in the initial comment here, that would likely be to inspect the planned_values structure and make sure that at least one of the resource instances that represents your etcd cluster machines (based on your local naming convention) remains in the planned new state. If not, you can either show a prominent warning to the human operator who will ultimately approve the change, or block the change from proceeding entirely.

This model does assume that you are running Terraform in automation and so can introduce an additional policy/safety verification step. We recommend that any team using Terraform in production should build automation around it to ensure repeatability and avoid credentials sprawl, and so this solution is an extension of that recommendation that also includes automatic review of proposed plans for certain problems that can be checked systematically.

With all of that said, we're going to close this out now since this JSON output feature is intended as a solution to a broader space of problems that includes the use-case in this issue. Since it's now multiple years after you originally opened this we expect your situation has changed anyway, but hopefully this new feature will be useful. Thanks for sharing this use-case!

@ghost
Copy link

ghost commented Aug 19, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Aug 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants