Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: transform blocks for handling terragrunt limitations #1809

Closed
wants to merge 5 commits into from

Conversation

yorinasub17
Copy link
Contributor

@yorinasub17 yorinasub17 commented Sep 14, 2021

This a proposed design for addressing #1808. I wanted to write this down before it hits brain rot and I forget about it.

Easy reading link.

Note that I think this feature does indeed belong in Terragrunt. As the discussion in #1774 revealed, the vast majority of modules in the registry do not support Terragrunt, nor do the module maintainers want to (it goes without saying that that is perfectly ok).

However, this does not mean that the modules in the registry are NOT service modules. Use cases of the module by users may show that there are modules in the registry that actually can be deployed directly as a service module, despite not being labeled as such, but have the variable and output limitations that terragrunt currently doesn't support. For those modules, it is fairly painful for users to have to wrap and repeat all the variables and outputs of the underlying module.

The feature proposed in this RFC should help support this, and I believe we can implement this with relatively little effort.

cc @lorengordon since you indicated being curious about how we could address this problem.

deployment:

```hcl
transform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be an attribute of the terraform block, since it would be specifically related to the source argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like it being a separate block, given that the terraform block is more about how terragrunt calls terraform, and transforming the code is a different operation. In that regard, it is more similar to generate than terraform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, and that makes me wonder what kind of terragrunt config/feature might it be to support multiple terraform blocks...

type = list(string)
}
output "my_password_hashed" {
sensitive = true
Copy link
Contributor

@lorengordon lorengordon Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would collisions be handled? E.g. say sensitive = true already existed for this output in the source module but the user specifically wanted to override it with sensitive = false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, this was more like a shallow merge operation, where each block shallow merges up, with the "default" being what's in the terraform code. Updated to clarify this: f2d8e44


- Scan all `.tf` files in the directory.
- For each file found, parse using the `hclwrite` parser.
- Walk the AST, looking for `variable` or `output` blocks that match the `transform` sub blocks.
Copy link
Contributor

@lorengordon lorengordon Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, this could also rewrite resource and data blocks? 🤯

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in theory, although I think that would be fairly complex. I think at that point, forking the underlying module is a better approach.

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting together the RFC. A few thoughts:

  1. Does this RFC encourage anti-patterns? For example, it seems like not specifying the type on an input variable, especially when the type is not a string, is borderline a bug. I understand it works in Terraform, but it feels like a leftover from the early days, when there weren't explicit types, and like something we'd want to discourage, rather than support via a first-class feature. Similarly, what is the use case for not putting sensitive on output variables that contain sensitive data, even in a shared module? Perhaps there is some legitimate reason for doing that, but it's not obvious to me what that is, and I don't know if adding a first-class feature to work around that is a good direction. Update: thinking about this more, is this just to avoid having to put sensitive = true all over the place, and instead allowing Terraform to track sensitive values internally, and you only need the explicit label if that output is in a root module and therefore, could be written to a log file?
  2. Thinking more through the points above, is the primary goal to make it possible to use legacy modules (i.e., those that are out of date with the latest Terraform functionality) or those that a user might not have access to change?
  3. Could overwriting the source code result in the changes accidentally getting written back to version control? In particular, IIRC, if you had a source URL pointing to a local file path, we (or more accurately, go-getter) used to use a symlink, so if we modify the code in .terragrunt-cache, it will actually modify it in the original source folder. Not sure if that still happens.
  4. Does this create maintenance headaches? E.g., Let's say you add transform blocks for a few variables, and then at some point, the maintainer of the module changes those variables: e.g., they add type to some of them, rename others, etc. Will it be obvious what's going on with these transform blocks? Is this debuggable?

@yorinasub17
Copy link
Contributor Author

Update: thinking about this more, is this just to avoid having to put sensitive = true all over the place, and instead allowing Terraform to track sensitive values internally, and you only need the explicit label if that output is in a root module and therefore, could be written to a log file?

Yes, this is my understanding. I think the issue is that it's hard to see this in shared modules, and thus know when you have to mark outputs as sensitive or not, since terraform doesn't really give you tools to do this (e.g., validate doesn't reveal this), as it is only necessary to do in the root module.

Does this RFC encourage anti-patterns?

I'm not sure this RFC by itself would encourage anti-patterns. Ultimately, many module developers would much rather focus on supporting the core terraform use case rather than additionally Terragrunt, and these issues are arguably only a problem if you use Terragrunt to deploy those modules (since it turns the shared module where these aren't issues, into a root module where they are). I also think this is a maintenance headache if you have direct access to the module itself. It seems much easier to maintain to have this logic directly in the module, than in terragrunt.hcl.

So it really is an escape hatch for cases where you can't modify the module.

That said, you do bring up a good point that these should really only be used for modules on the public registry, because otherwise it is desirable to mark input with type and output with sensitive in the module itself. Perhaps we can address this by restricting the block to only work when the source is tfr?

Thinking more through the points above, is the primary goal to make it possible to use legacy modules (i.e., those that are out of date with the latest Terraform functionality) or those that a user might not have access to change?

I was primarily thinking about the latter. More specifically, about modules on the public registry (which relates to the point above about restricting to tfr sources).

Could overwriting the source code result in the changes accidentally getting written back to version control? In particular, IIRC, if you had a source URL pointing to a local file path, we (or more accurately, go-getter) used to use a symlink, so if we modify the code in .terragrunt-cache, it will actually modify it in the original source folder. Not sure if that still happens.

Ah yes that is true... Another reason to do the tfr restriction...

Does this create maintenance headaches? E.g., Let's say you add transform blocks for a few variables, and then at some point, the maintainer of the module changes those variables: e.g., they add type to some of them, rename others, etc. Will it be obvious what's going on with these transform blocks? Is this debuggable?

This is a good point. In terms of debuggability, I think we can add a feature to validate-inputs (or a new command) that will write out the transformations that terragrunt will make. E.g., a command that writes out:

INFO: the following transformations will be made
- variable "existing-variable"
    - `type=string` => `type=list(string)`
- variable "var-with-no-type"
    - NIL => `type=map(string)`

WARN: the following transform blocks did not apply to this module
- variable "non-existant-variable"
- output "non-existant-output"

Given the above, what are your thoughts on restricting this to tfr source only?

@lorengordon
Copy link
Contributor

Given the above, what are your thoughts on restricting this to tfr source only?

Please no! We have a use case where we fork every module so we "own" it, but that's just so we can trust the source and the tags we've already inspected (managing supply chain risk)! We prefer not to modify the forks because that is a maintenance headache, and instead inspect upstream changes and update the forks as needed. Being able to modify the source on the fly when needed would be handy!

@yorinasub17
Copy link
Contributor Author

yorinasub17 commented Sep 16, 2021

Ah that makes sense @lorengordon . Hmm I'll have to think through what mechanisms we can put in place in terragrunt to support both concerns.

Off the top of my head, a compromise is to make it a soft restriction that is overrideable with config/cli arg. Something that basically indicates "I know what I am doing" to terragrunt. E.g., maybe by default it is restricted to tfr, but then there could be an undocumented but supported allow_source_protocols attribute on the transform block that allows you to override that restriction for use cases like the one you have where you know the module source is an unmodified fork.

I know this can be annoying to you, but at least it makes it explicit that terragrunt doesn't encourage usage of the transform block with sources that you actually have control over (thus addressing Jim's concern).

@lorengordon
Copy link
Contributor

lorengordon commented Sep 16, 2021

There could be an undocumented but supported allow_source_protocols attribute on the transform block that allows you to override that restriction for use cases like the one you have where you know the module source is an unmodified fork.

That would be fine. We certainly have no problem managing config. But I would suggest documenting it and putting a warning in the doc. Default value could be allow_source_protocols = ["tfr"].

@dmattia
Copy link
Contributor

dmattia commented Oct 6, 2021

If my understanding is correct, some submodules cannot be used as terragrunt root modules, but all modules that work as terragrunt root modules could be used as submodules in a terraform root module. Would it then make sense to make it so that terragrunt does not download the terraform.source parameter, but just references it via a slim module definition with a single module call?

A bit more in depth...

Currently, if I use module foo from the module registry, I see that .terragrunt-cache/abc/def/main.tf would have the main.tf file from the foo module, as defined remotely, just with extra files like remote_state_terragrunt.tf in the same directory to wrap the module into a root module.

Would things break if instead, terragrunt created a .terragrunt-cache/abc/def/main.tf file that was not downloaded from any remote source, but had something like:

module "terragrunt_root" {
  source = "terraform-aws-modules/foo/aws"

  inputs = "resolved from terragrunt pre-processing"
}

output "foo_output_1" { ...generated for each output in the foo module... }

There would need to be some path mapping done for local modules, but my (probably naive) understanding is that this would make the terragrunt/terraform compatibility stronger, as all modules could be supported this way, and things like module registry source URLs in terragrunt would work because terraform (with whatever version the user had downloaded) would be responsible for resolving the reference.

As far as the output generation goes, I've used https://github.com/hashicorp/terraform-config-inspect in terragrunt-atlantis-config and it was pretty easy to use, and looks like it would make this generation pretty straightforward.

@yorinasub17
Copy link
Contributor Author

@dmattia Yes that is the end extrapolation of this feature, but the concern is that that is a major backward incompatibility. It triggers a major migration where every terragrunt project will need to reallocate everything in state to be nested under the module call. This is made more complicated by the existence of generate blocks, which can inject arbitrary resources into the tf module. These would need to be excluded from the state move calls.

This change would also makes things like state mv calls less intuitive, as now there is a slight level of indirection on the state addresses. It's minor though so maybe not a big deal. The migration story is probably the bigger issue. I'm not sure we're quite ready to introduce a drastic change like that.

OTOH, something like transform blocks would be backward compatible, while still solving the problem for the short term.

@dmattia
Copy link
Contributor

dmattia commented Oct 6, 2021

@yorinasub17 Thank you for that explanation!

I think some of these may have solutions (like terragrunt state mv modifying the args before passing to terraform state mv, and maybe we still do download the terraform.source module, but put it under .terragrunt-cache/abc/def/terragrunt_managed_submodule/, which would allow for keeping generate support), but the state one is certainly tricky, as overriding that sounds problematic from a maintenance POV, even if terraform v1.0.0 promises more stability.

Overall I like this proposal, and think it's great that it's all backwards compatible. I do have one question though: Where would the line be drawn between using transform blocks and adding override.tf files in the directory next to the terragrunt.hcl file? I have a few spots in our codebase where we use a remote terraform.source, and then customize the module via override files

@yorinasub17
Copy link
Contributor Author

Where would the line be drawn between using transform blocks and adding override.tf files in the directory next to the terragrunt.hcl file?

The line would be whether you need to add blocks to the existing module or modify blocks. Some blocks in terraform support combining (e.g., the terraform block), but most blocks can't be merged together in terraform. For example, having two variable blocks with the same label will throw an error in terraform.

In essence, the transform block would primarily be used as a monkey patching escape hatch to modify existing blocks in the target module (hence the name transform as opposed to generate).

Co-authored-by: Zack Proser <zackproser@gmail.com>
@yorinasub17
Copy link
Contributor Author

Closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants