backend/local: create local state file if backend write fails #14423

apparentlymart · 2017-05-12T00:21:42Z

In the state system we have the idea of a local backup, which is actually still present and used when the local backend is active, but is no longer used when a remote backend is active.

This is problematic when an apply runs for long enough that someone's time-limited AWS STS credentials expire and then Terraform fails and can't persist state to S3.

To reduce the risk of lost state, here we add some extra fallback code for the local apply operation in particular. If either state writing or state persisting fail then we attempt to write the state to a special backup file errored.tfstate, and produce an error message that guides the user on how to retry uploading this state.

In the unlikely event that we can't write to local disk either (e.g. permissions problems) we take a last-ditch attempt to dump the JSON onto stdout and advise the user to manually copy it into a file for import. If even that doesn't work for some reason, we assume a critical Terraform bug (JSON-serialization problem with states?) and bail out with an apologetic error message.

This new behavior is added to the apply operation, rather than simply making the existing local backup system work for remote backends again, because doing it at the apply layer means we can give better feedback to the user and more easily have the additional fallback of printing to stdout.

This is implemented for the apply operation in particular because this is the one operation where new objects are created in real APIs that we don't want to lose track of. For other operations it's less bad to just generate a simple error message and have the user retry, since no new objects will have been created.

This fixes #14298. @phinze, I think this would also address part of what you apparently ran into earlier today where you needed to abort a Terraform process with expired STS credentials without losing state.

apparentlymart · 2017-05-12T00:37:35Z

Note for the future: one thing we will need to contend with here is situations where Terraform is running in automation, such as Terraform Enterprise. Such systems may want to have special handling for this situation, to ensure that the errored.tfstate file gets captured somewhere that the user can get it and possibly even to prevent further applies until a human indicates that the situation has been resolved. Not directly in the scope of this issue, but will need to deal with this eventually.

mitchellh

Extremely well done. One request for changes.

I think Apply is the right place to do this. Refresh is the only other command that modifies state and its not a big deal if that doesn't get persisted.

mitchellh · 2017-05-12T19:15:42Z

backend/local/backend_apply.go

+		// UX, so we should definitely avoid doing this if at all possible,
+		// but at least the user has _some_ path to recover if we end up
+		// here for some reason.
+		jsonState, jsonErr := json.MarshalIndent(applyState, "", "  ")


Could you use terraform.WriteState to a bytes.Buffer here instead? Its possible that JSON encoding directly may not produce the correct state.

👍 makes sense! I'd forgotten about that function.

jbardin

LGTM with the same change that @mitchellh requested.

jbardin · 2017-05-16T18:28:53Z

I wonder if we eventually should try to handle this better directly with BackupState? While this doesn't replace the current use for BackupState, since the user may want the previous rather than an intermediate state, maybe BackupState should record all writes in versioned files, or at least the original and the latest write?

On a related note, I've been trying to come up with a nice way to prevent users from aborting a run and then starting new run before recovering the state (after ignoring the UI warning of course), and losing the state (e.g. terraform apply; Ctrl+C - Ctrl+C; terraform apply; "where did everything go?"). I wanted to avoid leaving a new backup from every command, but maybe archiving all states in a tar.gz in the .terraform directory would work, since they compress very well.

apparentlymart · 2017-05-17T22:30:23Z

This is now updated based on the feedback.

To me this feels different than BackupState. The existing backup mechanism is, as you said, primarily concerned with helping users undo state changes they made that they wish they didn't, not with helping users recover from errors.

This situation is rather unique to the apply action. All other state-modifying commands don't benefit from this, because they don't make any new resources that Terraform needs to start tracking. I'd rather give users a straightforward error if, for example, terraform state rm fails, having them just try the operation again rather than suggesting they should try to recover by terraform state push.

Change updated in response to review feedback

In the old remote state system we had the idea of a local backup, which is actually still present for the legacy backends but no longer applies for the new-style backends like the s3 backend. It's problematic when an apply runs for long enough that someone's time-limited AWS STS credentials expire and then Terraform fails and can't persist state to S3. To reduce the risk of lost state, here we add some extra fallback code for the local apply operation in particular. If either state writing or state persisting fail then we attempt to write the state to a special backup file errored.tfstate, and produce an error message that guides the user on how to retry uploading this state. In the unlikely event that we can't write to local disk either (e.g. permissions problems) we take a last-ditch attempt to dump the JSON onto stdout and advise the user to manually copy it into a file for import. If even that doesn't work for some reason, we assume a critical Terraform bug (JSON-serialization problem with states?) and bail out with an apologetic error message. This is implemented for the apply command in particular because this is the one command where new objects are created in real APIs that we don't want to lose track of. For other operations it's less bad to just generate a simple error message and have the user retry. This fixes #14298.

ghost · 2020-04-11T02:09:05Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart requested review from mitchellh and jbardin May 12, 2017 00:21

apparentlymart mentioned this pull request May 12, 2017

Save state-out file locally when pushing to remote fails. #14311

Closed

paddycarver added core enhancement labels May 12, 2017

mitchellh previously requested changes May 12, 2017

View reviewed changes

jbardin reviewed May 16, 2017

View reviewed changes

apparentlymart force-pushed the b-state-failure-backup branch from fd22890 to d60b251 Compare May 17, 2017 22:26

jbardin approved these changes May 22, 2017

View reviewed changes

apparentlymart force-pushed the b-state-failure-backup branch from d60b251 to 28ebf6e Compare May 23, 2017 18:05

apparentlymart merged commit 9cda372 into master May 23, 2017

glasser mentioned this pull request May 23, 2017

remote state is not incrementally updated #14487

Open

stack72 deleted the b-state-failure-backup branch June 13, 2017 13:39

SnazzyBootMan mentioned this pull request Nov 20, 2017

Terraform apply access denied error when using S3 endpoints #16710

Closed

ghost locked and limited conversation to collaborators Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend/local: create local state file if backend write fails #14423

backend/local: create local state file if backend write fails #14423

apparentlymart commented May 12, 2017

apparentlymart commented May 12, 2017

mitchellh left a comment

mitchellh May 12, 2017

apparentlymart May 12, 2017

jbardin left a comment

jbardin commented May 16, 2017 •

edited

Loading

apparentlymart commented May 17, 2017

ghost commented Apr 11, 2020

backend/local: create local state file if backend write fails #14423

backend/local: create local state file if backend write fails #14423

Conversation

apparentlymart commented May 12, 2017

apparentlymart commented May 12, 2017

mitchellh left a comment

Choose a reason for hiding this comment

mitchellh May 12, 2017

Choose a reason for hiding this comment

apparentlymart May 12, 2017

Choose a reason for hiding this comment

jbardin left a comment

Choose a reason for hiding this comment

jbardin commented May 16, 2017 • edited Loading

apparentlymart commented May 17, 2017

ghost commented Apr 11, 2020

jbardin commented May 16, 2017 •

edited

Loading