Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement terragrunt_output #828

Merged
merged 22 commits into from
Aug 15, 2019
Merged

Implement terragrunt_output #828

merged 22 commits into from
Aug 15, 2019

Conversation

yorinasub17
Copy link
Contributor

@yorinasub17 yorinasub17 commented Aug 9, 2019

The implementation has now been completely changed. get_output is no longer a function and instead is a new block called terragrunt_output. The benefits of this approach are:

This provides an implementation of get_output, an interpolation function that can be used to extract the output of another terraform module wrapped with terragrunt config. See the README changes for more details.`

Note that as part of the implementation, it was required to refactor the configuration parsing during a xyz-all command to only do a partial decoding. Otherwise, apply-all will fail even if you specify dependencies because it will interpolate the get_output call in the initial pass to read the dependencies from the config. This is handled in a new, PartialParseConfigFile function which will only parse the sections specified. Note that because locals, include, dependencies, terraform blocks are evaluated in this section, apply-all in the initial setup will still fail if get_output. This is covered in the README.

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, fantastic. So excited to see this 👍

README.md Outdated Show resolved Hide resolved
README.md Outdated
will fail when you run an `apply-all` since the output will be interpolated before the target config has applied.
Additionally, during an `apply-all`, the terragrunt configuration has to be partially interpolated in order to build up
the dependency tree. This means that you will have issues using `apply-all` if `get_output` is used in the following
blocks, as it will be interpolated prior to any modules being applied:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this last one? Can you give an example?

Copy link
Contributor Author

@yorinasub17 yorinasub17 Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you had:

live
├── sql
│   └── terragrunt.hcl
└── vpc
    └── terragrunt.hcl

And you had the following in live/sql/terragrunt.hcl:

locals = {
  vpc_path = "../vpc"
  vpc_id = get_output(local.vpc_path, "vpc_id")
}

dependencies {
  paths = [local.vpc_path]
}

inputs = {
  vpc_id = local.vpc_id
}

Now suppose we have a completely clean project and nothing is applied yet and we run apply-all. In order to build the dependency tree, terragrunt needs to parse out the dependencies block and decode them before any modules have been applied. In the above use case, the dependencies block also references locals, so we need to decode locals as well. As part of that, we need to interpolate get_output.

Well, get_output will try to read the output of the vpc module and fail, because that hasn't been applied yet, because we are still in the phase of building up the dependency tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I gotcha. So then the get_output function would have to:

  1. Is this a single-module command (e.g., apply) or multi-module (e.g., apply-all).
  2. If single-module, and the dependency has not yet been applied (i.e., output is missing), return an error immediately.
  3. If multi-module, add the VPC to the dependency graph, pause config parsing until the command (e.g., apply) has been executed in that dependency, and then continue config parsing after.

Would that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still exists, but is less of an issue because of the way you access terragrunt_output.

README.md Outdated
the dependency tree. This means that you will have issues using `apply-all` if `get_output` is used in the following
blocks, as it will be interpolated prior to any modules being applied:

- `locals`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if you need to fetch 10 different outputs from a single dependency, I suspect 10 sequential terragrunt output calls would be slow. Do we cache the outputs in memory in between? If not, it would be nice to allow users to cache all the outputs in a locals variable once and then look up values within it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that is a good point. Under most circumstances, you can use get_output in locals. There are some caveats you have to be aware of: #828 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first impression was that we should maintain an internal, in-memory cache of (module absolute path, all outputs for that module) pairs. Any time you call get_output, it first checks the cache for that value: if it's present, it returns it immediately; if it's absent, it calls terraform output and caches the result. This ensures maximal performance without any extra effort for the user...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the implementation, I worry about what happens when we are going through multiple passes of the config, as it does in the xxx-all command. Here is where a global memoized cache could be problematic:

  • In the first pass through, it would run get_output on the previous state. This result gets cached.
  • The dependency gets applied as part of apply-all, in the dependency order.
  • We parse the config again to apply the dependent module. As part of that, we call get_output. We look up in the cache, and return the previous value, which is NOT the newly applied state.

Note that this only happens when you depend on the awkward chicken-and-egg scenario of using get_output in locals, and in a naive implementation using a global cache that is stored for the duration of a single terragrunt call.

We could have a different implementation where the cache is scoped to a single config parsing, which trades off more reuse for better predictability. I am leaning towards implementing it this way, as cache bugs are always super hard to debug. This of course requires more refactoring, since now I need to store the cache somewhere in a struct that is available throughout the parsing stage.

Copy link
Contributor Author

@yorinasub17 yorinasub17 Aug 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new implementation, you will call terragrunt output at most once per dependency per config. There is still room for memoization to cache across apply-all calls, but I think this is world's better than get_output which was once per function call.

And we don't need to do caching (which is always a pain to maintain due to annoying cache busting bugs) :)

// target config check: make sure the target config exists
targetConfig := params[0]
if util.IsDir(targetConfig) {
targetConfig = util.JoinPath(targetConfig, DefaultTerragruntConfigPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for the future: we could support reading outputs from normal Terraform modules (i.e., those not meant to be used with Terragrunt) by running terraform output instead of terragrunt output. Not particularly high priority for now though!

README.md Outdated Show resolved Hide resolved
config/config_helpers.go Outdated Show resolved Hide resolved
config/config_helpers.go Outdated Show resolved Hide resolved
config/config_helpers.go Outdated Show resolved Hide resolved
config/config_partial.go Outdated Show resolved Hide resolved
config/config_partial.go Outdated Show resolved Hide resolved
- Always read the entire output map so we have type data
- Refactor json to cty.Value conversion into its own function with unit tests
@yorinasub17
Copy link
Contributor Author

Ok the main sticking point is finding a good way to cache the get_output calls. Unfortunately, I am not sure what is the best way. I am not a fan of memoizing in the global space, and using locals has some issues as mentioned above.

I expect that the real solution is in #828 (comment), where we won't need to parse the dependencies block anymore. Although... we still need the locals block parsing since I expect you will want to use it for the terraform block parsing...

README.md Outdated Show resolved Hide resolved
README.md Outdated
will fail when you run an `apply-all` since the output will be interpolated before the target config has applied.
Additionally, during an `apply-all`, the terragrunt configuration has to be partially interpolated in order to build up
the dependency tree. This means that you will have issues using `apply-all` if `get_output` is used in the following
blocks, as it will be interpolated prior to any modules being applied:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I gotcha. So then the get_output function would have to:

  1. Is this a single-module command (e.g., apply) or multi-module (e.g., apply-all).
  2. If single-module, and the dependency has not yet been applied (i.e., output is missing), return an error immediately.
  3. If multi-module, add the VPC to the dependency graph, pause config parsing until the command (e.g., apply) has been executed in that dependency, and then continue config parsing after.

Would that work?

README.md Outdated
the dependency tree. This means that you will have issues using `apply-all` if `get_output` is used in the following
blocks, as it will be interpolated prior to any modules being applied:

- `locals`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first impression was that we should maintain an internal, in-memory cache of (module absolute path, all outputs for that module) pairs. Any time you call get_output, it first checks the cache for that value: if it's present, it returns it immediately; if it's absent, it calls terraform output and caches the result. This ensures maximal performance without any extra effort for the user...

config/config_helpers.go Outdated Show resolved Hide resolved
config/config_helpers.go Outdated Show resolved Hide resolved
config/config_partial.go Outdated Show resolved Hide resolved
@yorinasub17
Copy link
Contributor Author

Ok new idea. This got me thinking of a potential way to solve this: what if we introduce a new block? After navigating HCL2, I think I am beginning to understand that it is significantly easier to implement partial parsing decoding of individual blocks as opposed to when it appears in the middle of an AST.

Given that, it seems like the best way to achieve all of the goals is to introduce a new block construct like terraform_remote_state, but at the terragrunt.hcl level. Here is an example, since I think it will be clear once I show it:

terragrunt_output "vpc" {
  config = "../vpc"
}

inputs = {
  vpc_id = terragrunt_output.vpc.vpc_id
}

In this model, the caching problem goes away because we can resolve all the terragrunt_outputs in the initial parsing and stored somewhere as a reference which can then be reused.

Parsing order problems go away as well, since it can be parsed independently of the rest of the config (except maybe locals and includes?).

Implementing partial parsing just for dependency building is easy as well, since all we need to do is parse the blocks into a list of structs to get the config: no need to walk an AST!

Another benefit is that we have full control over when the output pulling happens, since now it is a first class task in the decoding pipeline, happening before we pass the HCL struct to the decoder. So the parsing logic will now be:

  1. parse locals
  2. parse include
  3. parse terragrunt_output (replaces dependencies)
  4. Get outputs from each terragrunt_output
  5. parse the rest of config

@brikis98
Copy link
Member

That seems like an elegant solution!

…that uses it yet)

- Replace free strings in partial decode decodeList with enums
@yorinasub17 yorinasub17 changed the title Implement get_output Implement terragrunt_output Aug 14, 2019
@yorinasub17
Copy link
Contributor Author

Ok @brikis98 this is ready for a rereview. I have implemented terragrunt_output blocks as described in #828 (comment). I believe this addresses almost all of your concerns from the initial PR:

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, superb work. This ended up even more complicated to accomplish than I originally assume, but I think this is a great solution 👍


terragrunt_output "redis" {
config_path = "../redis"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very, very cool 👍

Now that I've seen this, I can't wonder if this should be:

dependency "mysql" {
  config_path = "../mysql"
}

dependency "redis" {
  config_path = "../redis"
}

inputs = {
  mysql_url = dependency.mysql.outputs.domain
  redis_url = dependency.redis.outputs.domain
}

For this PR, the only difference would be a slight naming change: terragrunt_output becomes dependency and you read outputs from it via the .outputs attribute. However, in a future PR, we could allow you to read any part of that module's config! E.g., Perhaps dependency.mysql.inputs.foo would return the value of the "foo" output from the mysql dependency and dependency.redis.remote_state.config.bucket would return the bucket configured for remote state storage in the redis dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. I went through and made this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2126,6 +2206,46 @@ The `skip` flag must be set explicitly in terragrunt modules that should be skip
`terragrunt.hcl` file that is included by another `terragrunt.hcl` file, only the `terragrunt.hcl` file that explicitly
set `skip = true` will be skipped.


### Configuration parsing order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this 👍

config/config.go Outdated Show resolved Hide resolved
// Allowed References:
// - locals
// - terragrunt_output
// 5. Merge the included config with the parsed config. Note that all the config data is mergable except for `locals`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docs 👍

@@ -28,7 +29,7 @@ func CreateLoggerWithWriter(writer io.Writer, prefix string) *log.Logger {
// logging solution.
// Debugf will only print out terragrunt logs if the TG_LOG environment variable is set to DEBUG.
func Debugf(logger *log.Logger, fmtString string, fmtArgs ...interface{}) {
if os.Getenv("TG_LOG") == "DEBUG" {
if strings.ToLower(os.Getenv("TG_LOG")) == "debug" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document this anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to README

@yorinasub17
Copy link
Contributor Author

Ok merging and releasing this! Thanks for the review!

@yorinasub17 yorinasub17 merged commit 8fa2f71 into master Aug 15, 2019
@yorinasub17 yorinasub17 deleted the yori-get-output branch August 15, 2019 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants