-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix improper remote-exec timeout and communicator error handling #17596
Conversation
@@ -697,18 +704,15 @@ func (p *provisioner) runCommand(o terraform.UIOutput, comm communicator.Communi | |||
} | |||
|
|||
cmd.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Can save this for another day since subjective design discussion shouldn't block bug fixes)
Could we make the interface here instead be that cmd.Wait()
returns (int, error)
to make it clearer that there are some return values to deal with here? The pattern of mutating the object and getting the result via accessors -- particularly now that there are two of them -- feels less obvious and easier to do incorrectly.
I'd also consider following the lead of os/exec
here by making a non-zero exit status be a special type of error
that implements String
to return the error message you used below, thus making it simpler to handle in cases like this where the caller doesn't really care to distinguish errors returned by the program or operational errors along the way:
return cmd.Wait() // returns an error which might be a typed error wrapping an exit status
os/exec
wraps the entire process state, which is not possible here (due to the remoteness) and also doesn't seem necessary. A wrapper around just the int
feels sufficient.
It looks like there a few cases below where the caller does care to recognize the exit status, which we could make more ergonomic by a helper method remote.GetExitStatus(err)
that returns the exit status if it's that sort of error or zero otherwise:
if exitStatus := remote.GetExitStatus(err); exitStatus != 0 {
// ...
}
What do you think? (Again, we can totally ignore this for now just to get the bug fixed, if you prefer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was going to do the same thing, but then decided to try and keep the API changes minimal, while getting rid of the races and adding an error value. It turns out that I probably would have had the same amount of changes to handle had I added an error value to Wait, with a specific ExitError
error type for the return code. 🤷♂️
I think we can make that a separate patch, since it's more cosmetic than anything at this point.
The remote.Cmd struct could not convey any transport related error to the caller, meaning that interrupted commands would show that they succeeded. Change Cmd.SetExited to accept an exit status, as well as an error to store for the caller. Make the status and error fields internal, require serialized access through the getter methods. Users of remote.Cmd should not check both Cmd.Err() and Cmd.ExitStatus() until after Wait returns. Require communicators to call Cmd.Init before executing the command. This will indicate incorrect usage of the remote.Cmd by causing a panic in SetExitStatus.
The timeout for the remote command was taken from the wrong config field, and the connection timeout was being used which is 5 min. Any remote command taking more than 5 min would be terminated by disconnecting the communicator. Remove the timeout from the context, and rely on the global timeout provided by terraform. There was no way to get the error from the communicator previously, so the broken connection was silently ignored and the provisioner returned successfully. Now we can use the new cmd.Err() method to retrieve any errors encountered during execution.
This was updated to see if we can get at any error status from the remote command and transport, which still is not available, but kept the latest version since it fixes a couple race conditions.
Convert to the new Cmd.ExitStatus() method in the salt-masterless provisioner. Add calls to Wait and remove race conditions around setting the status.
Use the new ExitStatus method, and also check the cmd.Err() method for errors. Remove leaks from the output goroutines in both provisioners by deferring their cleanup, and returning early on all error conditions.
EvaApplyProvisioners was not returning errors if there was already a multierror stored in the Error field. Always return the error to the caller.
The provisionerFail_createBeforeDestroy test was verifying the incorrect output. The create_before_destroy instance in the state has an ID of "bar" with require_new="abc", and a new instance would get an ID of "foo" with require_new="xyz". The existing test was expecting the following state: aws_instance.bar: (1 deposed) ID = bar provider = provider.aws require_new = abc Deposed ID 1 = foo (tainted) Which showed "bar" still the primary instance in the state, with the new instance "foo" as being the deposed instance, though properly tainted. The new output is: aws_instance.bar: (tainted) (1 deposed) ID = foo provider = provider.aws require_new = xyz type = aws_instance Deposed ID 1 = bar Showing the new "foo instance as being the primary instance in the state, with "bar" as the deposed instance.
0879adf
to
88e911a
Compare
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
The remote.Cmd struct could not convey any transport related error to
the caller, meaning that interrupted commands would show that they
succeeded.
Change Cmd.SetExited to accept an exit status, as well as an error to
store for the caller. Make the status and error fields internal,
require serialized access through the getter methods.
Users of remote.Cmd should not check both Cmd.Err() and Cmd.ExitStatus()
until after Wait() returns.
Require communicators to call Cmd.Init before executing the command.
This will indicate incorrect usage of the remote.Cmd by causing a panic
in SetExitStatus.
The timeout for the remote command was taken from the wrong config
field, and the connection timeout was being used which is 5 min. Any
remote command taking more than 5 min would be terminated by
disconnecting the communicator. Remove the timeout from the context, and
rely on the global timeout provided by terraform.
There was no way to get the error from the communicator previously, so
the broken connection was silently ignored and the provisioner returned
successfully. Now we can use the new cmd.Err() method to retrieve any
errors encountered during execution.