Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve *-all error message output #722

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

msvechla
Copy link
Contributor

@msvechla msvechla commented Jun 3, 2019

Hi all,

Thanks for this awesome project, we are using it in production with 50+ AWS accounts in our two organizations every day!

This PR aims to improve the readability of the *-all error messages. With our huge codebases, it can get really hard to find a small module, which errors out during a big apply-all run. The PR tries to improve this, by printing the root-cause error messages at the end of the execution run. Root-cause in this case means all module errors, excluding dependency errors.

Please let me know if you have any suggestions to improve this further.

@msvechla
Copy link
Contributor Author

msvechla commented Jun 3, 2019

I know using global variables is probably not the best way to achieve this, but I tried it earlier by incorporating the variables into the Stack struct, but this made everything way more complicated, so I decided to go for readability.

Also the go channel is currently not really needed, but might improve readability and future refactoring, as the concurrency is now explicitly coded.

Do you have any suggestions for unit tests?

@brikis98
Copy link
Member

brikis98 commented Jun 5, 2019

Nice, thanks for the PR! Could you share an example of what the log output will look like now (a small snippet, not the whole thing, of course)?

Please note that we're going to hold off on merging anything until #466 is resolved, as that's very high priority. Once that one is in, please pull the latest from master, and give us a ping to review.

@msvechla
Copy link
Contributor Author

@brikis98 I just rebased from master and pushed again.

The output currently looks like this: At the end of the exuction of an *-all command, we see the following summary when errors occurred. The output with terraform > v0.12 is a little bit more verbose now, as it prints an additional warning:

Warning: Skipping backend initialization pending configuration upgrade

[terragrunt] 2019/06/10 22:50:20 Encountered the following root-causes:
------------------------------------------------------------------------------------------------------------------------------------
Module /Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app1:
[terragrunt] [/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app1] 2019/06/10 22:50:20 Running command: terraform init

Warning: Skipping backend initialization pending configuration upgrade

The root module configuration contains errors that may be fixed by running the
configuration upgrade tool, so Terraform is skipping backend initialization.
See below for more information.


Terraform has initialized, but configuration upgrades may be needed.

Terraform found syntax errors in the configuration that prevented full
initialization. If you've recently upgraded to Terraform v0.12, this may be
because your configuration uses syntax constructs that are no longer valid,
and so must be updated before full initialization is possible.

Terraform has installed the required providers to support the configuration
upgrade process. To begin upgrading your configuration, run the following:
    terraform 0.12upgrade

To see the full set of errors that led to this message, run:
    terraform validate

Error: Unsupported block type

  on main.tf line 1:
   1: outputa "app1_text" {

Blocks of type "outputa" are not expected here. Did you mean "output"?


------------------------------------------------------------------------------------------------------------------------------------
Module /Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app2:
[terragrunt] [/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app2] 2019/06/10 22:50:20 Running command: terraform init

Warning: Skipping backend initialization pending configuration upgrade

The root module configuration contains errors that may be fixed by running the
configuration upgrade tool, so Terraform is skipping backend initialization.
See below for more information.


Terraform has initialized, but configuration upgrades may be needed.

Terraform found syntax errors in the configuration that prevented full
initialization. If you've recently upgraded to Terraform v0.12, this may be
because your configuration uses syntax constructs that are no longer valid,
and so must be updated before full initialization is possible.

Terraform has installed the required providers to support the configuration
upgrade process. To begin upgrading your configuration, run the following:
    terraform 0.12upgrade

To see the full set of errors that led to this message, run:
    terraform validate

Error: Unsupported block type

  on main.tf line 1:
   1: outputwas "app2_text" {

Blocks of type "outputwas" are not expected here.

 
[terragrunt] 2019/06/10 22:50:20 Encountered the following errors:
Hit multiple errors:
exit status 1
Hit multiple errors:
exit status 1

Still, having all errors and the related modules printed at the end of the runner is a great benefit when running large executions. Please let me know what you think.

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

A few questions/thoughts:

  1. Why use channels for this? Could we instead collect all the data in the error returned by *-all commands and render the error at the end from that value? E.g., we already have a MutliError struct in the errors package. It seems like working with a return value is easier than channels, global vars, etc.

  2. Could you add some tests for this? E.g., Add a new test in integration_test.go that runs against a fixture that has a deliberate error, and make sure the stdout you get back shows that error properly?

@msvechla
Copy link
Contributor Author

msvechla commented Jun 30, 2019

Thanks for the feedback @brikis98!

I did some refactoring and removed the channels. For sure I will add tests once this last bit is resolved.

Do you have any idea why I still get the terraform init output, even though I only print the stderr stream at the end? As far as I know the terraform init messages should be on the stdout stream.

E.g. this is my current Error output at the end:

[terragrunt] 2019/06/30 22:56:08 Encountered the following errors:
------------------------------------------------------------------------------------------------------------------------------------
/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-failure/missingvars: 
Hit multiple errors:
exit status 1 
[terragrunt] [/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-failure/missingvars] 2019/06/30 22:56:08 Running command: terraform init
Initializing modules...

Initializing the backend...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Error: Missing required argument

  on main.tf line 2, in module "sub":
   2: module "sub" {

The argument "missingvar1" is required, but no definition was found.


Error: Missing required argument

  on main.tf line 2, in module "sub":
   2: module "sub" {

The argument "missingvar2" is required, but no definition was found. 

@brikis98
Copy link
Member

brikis98 commented Jul 1, 2019

Do you have any idea why I still get the terraform init output, even though I only print the stderr stream at the end? As far as I know the terraform init messages should be on the stdout stream.

Not sure I follow. You seem to be getting an error about a missing variable. What does stdout or stderr have to do with it?

@msvechla
Copy link
Contributor Author

msvechla commented Jul 1, 2019

What I posted is the new detailed output of my change. The idea is to have a summary of all module errors including their error messages (stderr) at the end of the execution. If you check out https://github.com/gruntwork-io/terragrunt/pull/722/files#diff-86e77ee353cd3bacb4a1f0c492bf9e2cR169 of my change, you can see that I am capturing the stderr and outputting it in the collectErrors() method: https://github.com/gruntwork-io/terragrunt/pull/722/files#diff-86e77ee353cd3bacb4a1f0c492bf9e2cR186.

Somehow the terraform init code shows up in the stderr stream, even though when I do a normal terraform run, it is printed to stdout.

So my question would be, if you have any idea why the terraform init output shows up in `stderr.

@brikis98
Copy link
Member

brikis98 commented Jul 1, 2019

Ohhhh, I gotcha, thx for providing the context 😁

The behavior you're seeing is probably from this: https://github.com/gruntwork-io/terragrunt/blob/master/cli/cli_app.go#L606-L607

@msvechla
Copy link
Contributor Author

msvechla commented Jul 2, 2019

Yep, thats it, thanks a lot for pointing me in the right direction!

In the comment here it says:

Don't pollute stdout with the stdout from Auto Init

So I assume it will not be a big issue moving this back to stdout, or what was the reasoning behind this?

Also I found this part where the logger is set by default to stderr, which also leads to some pollution of my error output.

I now adjusted both parts and now the detailed error output is clean. Can you think of any issues these changes could cause from the top of your head? Of course I will run the tests to make sure there are no obvious issues.

@brikis98
Copy link
Member

brikis98 commented Jul 2, 2019

Can you think of any issues these changes could cause from the top of your head?

Yes. Consider someone running the following:

url=$(terragrunt output url)

They expect that the value of the output variable url, and only that value, is written to stdout. If the auto init functionality writes to stdout, then that assumption will break. Hence, we redirect auto init output to stderr.

In general, if you run terragrunt <cmd>, where <cmd> is any standard Terraform command, what's written to stdout should be the same as if you had run terraform <cmd> directly.

@msvechla
Copy link
Contributor Author

msvechla commented Jul 2, 2019

Thanks for clarifying, of course that makes perfect sense.

In this case we would either have to live with the more verbose detailed error message at the end of the execution, or I would have to come up with some way of extracting the auto-init output from the detailed error message.

I will look into it again.

@msvechla
Copy link
Contributor Author

msvechla commented Jul 2, 2019

Alright, I made it work by saving the auto-init output and extracting it from the detailed error messages.

The current detailed error messages now look like this:

[terragrunt] 2019/07/02 23:46:37 Encountered the following errors:
====================================================================================================================================
/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app1 (root error): 

Hit multiple errors:
exit status 1 

[terragrunt] [/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app1] 2019/07/02 23:46:37 Running command: terraform init

Error: Reference to undeclared input variable

  on main.tf line 2, in output "app1_text":
   2:   value = "app1 output ${var.aasd}"

An input variable with the name "aasd" has not been declared. This variable
can be declared with a variable "aasd" {} block.


------------------------------------------------------------------------------------------------------------------------------------
/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app2 (dependency error): 

Cannot process module Module /Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app2 (excluded: false, dependencies: [/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app3, /Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app1]) because one of its dependencies, Module /Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app1 (excluded: false, dependencies: [/Users/msvechl/go/src/github.com/gruntwork-io/terragrunt/test/fixture-auto-retry/apply-all/app3]), finished with an error: Hit multiple errors:
exit status 1 


------------------------------------------------------------------------------------------------------------------------------------
[terragrunt] 2019/07/02 23:46:37 Unable to determine underlying exit code, so Terragrunt will exit with error code 1 

What do you think? Should we also make this output optional or use it by default?

@brikis98
Copy link
Member

brikis98 commented Jul 6, 2019

So that's the output you see at the end of an xxx-all call?

If so, I think that looks like a terrific improvement. I assume those errors are grouped by module?

@msvechla
Copy link
Contributor Author

msvechla commented Jul 6, 2019

Yep, that is correct. As the collectErrrors() method loops over all running modules, this automatically groups all the errors by module.

I will try and add some tests now.

@msvechla
Copy link
Contributor Author

msvechla commented Jul 6, 2019

Alright, I just added module and integration tests. Do you have further improvement suggestions?

@msvechla
Copy link
Contributor Author

msvechla commented Jul 9, 2019

I did some refactoring to separate the normal and detailed errors in the MultiError struct. This also fixed problems with some of the tests. Do you have any further input @brikis98?

@brikis98
Copy link
Member

Apologies for the delay! Been completely buried. I really appreciate this PR and will try to review this as soon as I can. 🍺

@msvechla
Copy link
Contributor Author

Don't worry, take your time! Just get back to me when you can 👍

Copy link
Member

@brikis98 brikis98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we're back from our company offsite and going through PRs now. Thank you for your patience!

@@ -233,6 +270,8 @@ func (module *runningModule) moduleFinished(moduleErr error) {
module.Module.TerragruntOptions.Logger.Printf("Module %s has finished with an error: %v", module.Module.Path, moduleErr)
}

fmt.Fprintf(module.Writer, "%s\n%v\n\n%v\n", OutputMessageSeparator, module.Module.Path, module.OutStream.String())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fmt.Fprintf here?

@@ -80,6 +81,9 @@ type TerragruntOptions struct {
// If you want stderr to go somewhere other than os.stderr
ErrWriter io.Writer

// Stores output of auto-init so it can be removed later form other streams
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Stores output of auto-init so it can be removed later form other streams
// Stores output of auto-init so it can be removed later from other streams

@@ -154,6 +168,8 @@ func runModules(modules map[string]*runningModule) error {
waitGroup.Add(1)
go func(module *runningModule) {
defer waitGroup.Done()
module.Module.TerragruntOptions.ErrWriter = io.MultiWriter(&module.OutStream, &module.ErrStream)
module.Module.TerragruntOptions.Writer = &module.OutStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this... You're overwriting TerragruntOptions.ErrWriter to write to both module.OutStream and module.ErrStream... But what was the Terragrunt.LOptions.ErrWriter value set to before that? Are module.OutStream and module.ErrStream initialized to anything? Will this buffer those errors until the very end or stream to stdout / stderr?

Same questions go for TerragruntOptions.Writer, with the additional one of what happens when you point a second item to module.OutStream?

// generateDetailedErrorMessage extracts the clean stderr from a module and formats it for printing
func generateDetailedErrorMessage(module *runningModule) error {
// remove the auto-init pollution from the error stream
cleanErrorOutput := strings.Replace(module.ErrStream.String(), module.Module.TerragruntOptions.InitStream.String(), "", -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, string replacement feels... A bit hacky. Is there any way to have the AutoInit write to stdout / stderr, but to not write to module.ErrStream? That way, you could use module.ErrStream directly, without having to clean anything out...

@@ -80,6 +81,9 @@ type TerragruntOptions struct {
// If you want stderr to go somewhere other than os.stderr
ErrWriter io.Writer

// Stores output of auto-init so it can be removed later form other streams
InitStream bytes.Buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there should be an AutoInitWriter and AutoInitErrWriter here instead? By default, these would be set to the same as Writer and ErrWriter. However, for xxx-all commands, these could be set to separate values that write to stdout / stderr but are not stored in those streams later used to print clean error messages?

Also, should this value be in the Clone method? If not, explicitly add a comment there explaining why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants