-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial/Progressive Configuration Changes #4149
Comments
@phinze this proposal brings together several discussions we've had elsewhere and provides a new take on the "static vs. dynamic" problem that seems to be a common theme in Terraform supporting more complex use-cases. I'd love to work on something in this vein early next year if you guys think this is a reasonable direction to take, but given the work involved I'd love to hear your thoughts before I get stuck in to implementation. |
I could see this being very helpful, and a nice way to simplify situations where we are sometimes otherwise a bit stuck with one or another limitation. |
Thanks for this - as usual! - wonderfully clear write up, @apparentlymart. 😀 This solution does seem flexible and fairly straightforward conceptually. As I'm sure you'd agree, introducing the possibility of partial plans / applies would be a pretty central modification to Terraform's workflow, so it deserves careful scrutiny. You have shown how partial progress could enable more complex configurations, but I'm worried about the cost to the general UX of the tool. The usage pattern would go from from two well-defined discrete steps to "just keep retrying until it goes through." One of Terraform's most important qualities is in its ability to predict the actions it's going to take. This feature proposes to trade off Terraform's prediction guarantees to enable previously impossible configs, or in other words "enable Terraform to automate configurations it cannot predict (in one step)." At this point, Partial Applies feels like a heavy hammer to wield before addressing some of the more specific pain points you mention like:
However these are just my initial thoughts - I'll keep thinking and we can keep discussing! |
Thanks for the feedback. It's certainly what I was expecting, and echoes my own reservations about the design. My rationale for proposing it in spite of those reservations was to observe that users are already effectively doing this workflow to work around what I perceive to be some flaws in Terraform's "ideal" model of being able to operate in a single step. Here are two workarounds that my team is doing regularly, and that I've seen recommended to others running into similar problems:
Something like #2976 certainly reduces the cases where this is necessary, but not to zero. Consider for example the use-case of spinning up a Rundeck instance as an Thus I have concluded that "Terraform can plan everything" is a nice ideal, but it doesn't seem to stand up to reality. This proposal basically embraces the I'd consider this a big improvement over the current situation, where quite honestly my coworkers lose confidence in Terraform every time I have to guide them through a manual workaround to a problem like I've described here, since it feels like hacking around the tool rather than working with it, and they (quite rightly) consider what would happen if they found themselves in such a situation in the middle of the night during an incident and were forced to create a workaround on the fly, working solo. I think it'd be important to couple this architectual shift with the continued effort to have Terraform detect as many errors as possible at plan time, so that errors during I also think that it's important that Terraform describe the "I need multiple steps" situation very well after a plan, so that users can make an informed decision about whether to proceed with the multi-step process or whether to rework/simplify their configuration to not require it, if the "plan everything in one step" guarantee is important to them and they aren't willing to risk that step 2 might not be what they expected. With all of that said, I'm just trying to directly address your initial feedback and hopefully aid your analysis; I totally agree that this is a significant change that requires scrutiny, and am happy to let this soak for a while. |
This helps a lot, Martin, and it's very persuasive! I agree that forcing users to I have been silently wondering if Terraform should just encourage separation between "unplannable boundaries" into separate configs, but reading your description:
It does seem perhaps too tedious and clunky to be a "recommended architecture". I'll let this bounce around my brain over the weekend and follow up next week. 🍻 |
This is a lot like something I've been bouncing around with @phinze for awhile. Good to see this to start being formalized a bit more. I actually called this something like "phased graphs" or something terrible, but partial apply makes more sense. I think internally what we do is actually separate various graph nodes into "phases". Each phase on its own should be its own completely cycle free thing. It has various semantic checks such as: it can only depend on things from phase N or N-1, and it can't contain any cycles within it. Detecting when a phase increase needs to happen uses the rules you outlined above: computed dependency from a provider, or a computed count. Then for plan output we note the various phases that exist, and when you run it, we just increment the phase count, then you can plan the next phase, etc etc. Does this make sense? |
On second thought, we can probably make this a lot simpler: we detect when we have what I described above as "phase change" and just stop the plan/apply there, notifying the user that another run is required to continue forward. Once an apply is done then re-running it on the whole thing shouldn't affect things. I can see the benefit of making this a more stateful operation but that is something we can do as an improvement later. The idea in the former paragraph can introduce partial applies while not having any huge UX change. |
I sort of use -target in an attempt to limit terraform to "phases". I recently noticed this would still try to rm / delete some resources if TF deemed necessary (even nodes outside the "target"). |
@mitchellh Thanks for that feedback. It's great to hear how you're thinking about this. Considering your first comment about planning multiple phases, I believe it's not possible for Terraform to plan the full set of apply steps in all cases. Consider the scenario where the provider configuration is computed: we then can't refresh/diff any resources belonging to that provider. I think the best we could do is note in the output that we're deferring certain resources, but I expected that would make the output rather noisy... seems like a good thing to learn via some prototyping. I think your second comment describes what I had originally tried to propose: go as far as possible, let the user know it's not complete, and then let them re-plan to move to the next step. For me, marking a node as "deferred" was what you called "phase change", assuming I understand correctly. In my approach there are only two "phases" for a given plan: "planned" or "deferred". At each "apply" the state is updated, without having to retain anything new in the state file, just like we'd do if there was an error during the apply phase. I'm hoping to spend some time prototyping this in the new year. I think it will be easier to talk about the details of this with a strawman implementation to tear apart. |
@apparentlymart I don't think I was clear enough, but I think we're agreeing. Only the current phase would be planned (contain a diff). The rest is a no-op, but may have to have some logic associated about it upstream, probably not downstream in the graph (in order to get the state to read things). |
From the peanut gallery: I'm very relieved to find this thread, and just wanted to offer the perspective of a newbie user in love with this idea, but running into horrible chicken and egg roadblocks trying to spin up what should be a fairly simple AWS VPC. At this point, after hours (actually days) of trying to manually pull out parts of configs and then put them back in to work around these chicken and egg scenarios that terraform is supposed to handling for me , terraform's state seems so confused to be moribund. Whereas before, I had a working environment, now I can't even figure out how to fix terraforms state to make it understand what's actually there - as it seems totally confused about that now, no doubt due to all my fiddling. I guess I just rip the entire thing out and start from scratch now? Will it even be ale to do that at this point? If I were already a terraform expert - no doubt I'd recognize typical patterns, and know how to work around them as your remarkably named colleague, apparentlymart, can. But as a novice to your tech, I wouldn't even know where to begin. As he observed above, I am so addled by the experience, given how simple this environment is, that I would be terrified to even try to use this magic on anything remotely complex, and would never dream of suggesting some of my even less mart colleagues try. The thought of making some minor change to my environment that, basically, completely borks it and leaves me not even knowing what the state is - is the stuff of nightmares, quite literally. That does seem to be a problem. I am not as mart as apparentlymart, but it seems obvious that having a single pass in a system which must rely on future states for the data to support subsequent steps, is one of those magical ideas that looks great in theoretical language about phase state... but, basically, must fail in practice. Again, I'm not remotely as mart as any of you, and that's why I'm trying to help you see that mere mortals have a hard time seeing how this reduces effort - quite to the contrary - I can manually make a VPC with a functioning vyetta node in a about fifteen minutes, and not be terrified that the smallest change will take my business offline. Sorry for the long read and degree of frustration in tone - I have been at this, as I said, for days. And these tools are exciting! I'll be watching as it becomes something predictable and usable. |
Quick follow-up as an example - my most recent plan was hanging when applying trying to delete a network interface.. it would eventually time out and not report the details of the failure beyond that it timed out. The problem ended up being that I had manually created an instance on that subnet, and so AWS was refusing to delete the subnet prior to deleting the dependent instance/interface.
So - not a huge deal, obviously, and not a chicken and egg scenario - but an example of the sort of workflow detritus that gets really frustrating in repetition. Again - I'm a huge fan, and don't expect a zero learning curve experience. Just offering the novice perspective. Thanks for the project and all your efforts! |
@erich-comsIO, there are a few assumption one should make, or allow Terraform to make (such as, let Terraform manage its space and interfere as little as possible), and you will run into these types of issues less often. It takes time, sharing, and sometimes bug squashing to improve that learning curve. I have run into similar issues and circular dependencies with template_file, subnets, security groups, etc. Part of this is also due to nuances with the provider, the AWS console smooths out the experience in many ways you do not realize until using Terraform. Are you able to connect to #terraform-tool on freenode IRC? I would be happy to help you through some of these types of issues. The mailing list is also a great place for this discussion, this ticket will quickly spin out of control if we veer too far away from the central discussion. |
Thanks for the reply and offer of help. Very much appreciated. I’m about to jump into a meeting, but will start hanging out in the irc channel. I do get that I need to stay out of the way as much as possible and let it do its thing. But then I run into these circular dependency issues. I gather I need to start using —target to manage those things terraform seems unaware of or unable to track at AWS regarding these kinds of dependencies. Departing this thread, and will ask my noob questions in new topics. Thanks again!
|
A follow up idea, building on this proposal: Terraform will usually happily propagate computed values through the graph as things are created, which means the initial diff often has a whole lot of raw interpolations that don't get resolved until apply time. This also means that certain validations can't be done until apply time, since the value to validate isn't yet known. With partial apply support it becomes possible to offer a I often "fake" I wouldn't tackle this immediately when implementing this feature, but I think this is a further use-case for having the infrastructure to allow partial plan/apply. |
@apparentlymart Another good idea, but I'd make the same cautious viewpont as above: let's treat that as a separate idea once we do this one. |
Should this issue be resolved, please check #2253 to see if it is resolved by the resolution to this issue. |
I think this issue is broader, especially regarding the behaviour of Terraform to automate away what needs to be included/excluded. For OP:
|
Hi all! It's been a long time. I originally opened this issue quite some time before I joined HashiCorp to work on Terraform full-time, and although the underlying problem statement of this issue remains valid, the exact details I described here have become less relevant to modern Terraform over time and so it's been clear to us that we will need to take a fresh start at designing it, taking into account more recent changes to the way providers are developed, the better handling of unknown values in Terraform v0.12 and later, the introduction of data sources in the meantime, and various other situations that are clearer to the Terraform team today than they were to me as an external contributor back in 2015. With that in mind, I've decided to close this issue and replace it with one that represents just the problem to be solved and not yet any specific solution to it. My hope is that we'll use that new issue to discuss the relevant constraints and challenges and eventually reach a new proposal that makes sense for Terraform as it exists today, which may or may not be similar to what I mocked up in this older issue. The new issue is #30937. If you're interested in following along with or participating in that discussion, please move your issue subscription over to that issue instead. I'm going to lock this one just to avoid continued additions to this issue and thus a fragmented discussion. Thanks for the discussion here so far! The history of this issue isn't going anywhere, so we'll still be able to take into account the existing feedback as we consider possible approaches to solve this problem. |
For a while now I've been wringing my hands over the issue of using computed resource properties in parts of the Terraform config that are needed during the refresh and apply phases, where the values are likely to not be known yet.
The two primary situations that I and others have run into are:
count
modifier on resource blocks, as described in count parameter needs to be able to interpolate variables from modules. #1497. Currently this permits only variables, but having it configurable from resource attributes would be desirable.After a number of false-starts trying to find a way to make this work better in Terraform, I believe I've found a design that builds on concepts already present in Terraform, and that makes only small changes to the Terraform workflow. I arrived at this solution by "paving the cowpaths" after watching my coworkers and I work around the issue in various ways.
The crux of the proposal is to alter Terraform's workflow to support the idea of partial application, allowing Terraform to apply a complicated configuration over several passes and converging on the desired configuration. So from the user's perspective, it would look something like this:
For a particularly-complicated configuration there may be three or more apply/plan cycles, but eventually the configuration should converge.
terraform apply
would also exit with a predictable exit status in the "partial success" case, so that Atlas can implement a smooth workflow where e.g. it could immediately plan the next step and repeat the sign-off/apply process as many times as necessary.This workflow is intended to embrace the existing workaround of using the
-target
argument to force Terraform to apply only a subset of the config, but improve it by having Terraform itself detect the situation. Terraform can then calculate itself which resources to target to plan for the maximal subset of the graph that can be applied in a single action, rather than requiring the operator to figure this out.By teaching Terraform to identify the problem and propose a solution itself, Terraform can guide new users through the application of trickier configurations, rather than requiring users to either have deep understanding of the configurations they are applying (so that they can target the appropriate resources to resolve the chicken-and-egg situation), or requiring infrastructures to be accompanied with elaborate documentation describing which resources to target in which order.
Implementation Details
The proposed implementation builds on the existing concept of "computed" values within interpolations, and introduces the new idea of a graph nodes being "deferred" during the plan phase.
Deferred Providers and Resources
A graph node is flagged as deferred if any value it needs for
refresh
orplan
is flagged as "computed" after interpolation. For example:count
value is computed.Most importantly though, a graph node is always deferred if any of its dependencies are deferred. "Deferred-ness" propagates transitively so that, for example, any resource that belongs to a deferred provider is itself deferred.
After the graph walk for planning, the set of all deferred nodes is included in the plan. A partial plan is therefore signaled by the deferred node set being non-empty.
Partial Application
When
terraform apply
is given a partial plan, it applies all of the diffs that are included in the plan and then prints a message to inform the user that it was partial before exiting with a non-successful status.Aside from the different rendering in the UI, applying a partial plan proceeds and terminates just like an error occured on one of the resource operations: the state is updated to reflect what was applied, and then Terraform exits with a nonzero status.
Progressive Runs
No additional state is required to keep track of partial application between runs. Since the state is already resource-oriented, a subsequent
refresh
will apply to the subset of resources that have already been created, and thenplan
will find that several "new" resources are present in the configuration, which can be planned as normal. The new resources created by the partial application will cause the set of deferred nodes to shrink -- possibly to empty -- on the follow-up run.Building on this Idea
The write-up above considers the specific use-cases of computed provider configurations and computed "count". In addition to these, this new concept enables or interacts with some other ideas:
foreach
could iterate over arbitrary resource globs or collections within resource attributes, without introducing a new "generator" concept, by deferring the planning of the multiple resource instances until the collection has been computed.plan
/apply
cycles to converge could be used for rolling updates too.create_before_destroy
with not, as documented in Documentcreate_before_destroy
limitation #2944, could get a better UX by adding some more cases where nodes are "deferred" such that the "destroy" node for the deposed resource can be deferred to a separate run from the "create" that deposed it.provider
attribute on resources to be interpolated. It's mainly concerned with interpolating from variables rather than resource attributes, but the partial plan idea allows interpolation to be supported more broadly without special exceptions like "only variables are allowed here", and so it may become easier to implement interpolation of provider.The text was updated successfully, but these errors were encountered: