Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State file is not saved locally when uploading to remote backend fails #14298

Closed
roman-vynar opened this issue May 8, 2017 · 2 comments · Fixed by #14423
Closed

State file is not saved locally when uploading to remote backend fails #14298

roman-vynar opened this issue May 8, 2017 · 2 comments · Fixed by #14423
Assignees

Comments

@roman-vynar
Copy link

roman-vynar commented May 8, 2017

Terraform does not save a state file locally when uploading to the remote backend fails. In my case I was creating 104 resources on AWS and after 1 hour of working and completion, suddenly, terraform failed to upload the resulting state file to S3 with 403 error to the bucket (with no reason, I didn't have the access problems) but important here, it left me without a state file at all - no local copy unfortunately. So basically, I have got a ton of orphaned AWS resources.

Terraform Version

v0.9.3, v0.9.4

Affected Resource(s)

Uploading a state file to the remote backend S3

Expected Behavior

From https://www.terraform.io/docs/backends/state.html#state-storage

In the case of an error persisting the state to the backend, Terraform will write
the state locally. This is to prevent data loss. If this happens the end user must 
manually push the state to the remote backend once the error is resolved.

Actual Behavior

Data loss, you are left with the orphaned resources :(

Steps to Reproduce

Here my-bucket should be a real bucket.

$ cat 1.tf
terraform {
  backend "s3" {
    bucket  = "my-bucket"
    key     = "roman/roman.tfstate"
    region  = "us-west-2"
    encrypt = true
  }
}

resource "random_id" "random_password" {
  byte_length = 1

  provisioner "local-exec" {
    command = "sleep 30"
  }
}
$ terraform init -backend=true -get=true
$ terraform plan
...

+ random_id.mysql_password
    b64:         "<computed>"
    b64_std:     "<computed>"
    b64_url:     "<computed>"
    byte_length: "1"
    dec:         "<computed>"
    hex:         "<computed>"


Plan: 1 to add, 0 to change, 0 to destroy.

Now run the below command and while it is sleeping on provisioning you will have 30s to quickly edit your /etc/hosts and add a line like this 1.2.3.4 my-bucket.s3-us-west-2.amazonaws.com to repoint DNS to something fake and have the upload to remote backend to fail:

$ terraform apply
random_id.mysql_password: Creating...
  b64:         "" => "<computed>"
  b64_std:     "" => "<computed>"
  b64_url:     "" => "<computed>"
  byte_length: "" => "1"
  dec:         "" => "<computed>"
  hex:         "" => "<computed>"
random_id.mysql_password: Provisioning with 'local-exec'...
random_id.mysql_password (local-exec): Executing: /bin/sh -c "sleep 30"
random_id.mysql_password: Still creating... (10s elapsed)
random_id.mysql_password: Creation complete (ID: Sg)
Failed to save state: Failed to upload state: RequestError: send request failed
caused by: Put https://my-bucket.s3-us-west-2.amazonaws.com/roman/roman.tfstate: x509: certificate is valid for *.domain.com, not my-bucket.s3-us-west-2.amazonaws.com

No matter what the error is, it could be 403, 404 or in my case certificate problems. Terraform only saves a current state file to terraform.tfstate.backup locally which is a state file before resource creation, but after they are created and upload fails - nothing is saved neither to the current folder or to .terraform/ and you get a data loss.

Thanks, please let me know if you need any further information.

@abrefort
Copy link

I can confirm that all Terraform 0.9.x versions are impacted.
This is especially problematic when using AssumeRole credentials as those expire after an hour.

Terraform 0.8 allowed us to just run a second apply with new credentials to resume operations as the local and updated version of the state was synced to the remote storage at the start of the second run.

@apparentlymart apparentlymart self-assigned this May 10, 2017
apparentlymart added a commit that referenced this issue May 12, 2017
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.

It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.

To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.

In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.

This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.

This fixes #14298.
apparentlymart added a commit that referenced this issue May 17, 2017
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.

It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.

To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.

In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.

This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.

This fixes #14298.
apparentlymart added a commit that referenced this issue May 23, 2017
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.

It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.

To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.

In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.

This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.

This fixes #14298.
apparentlymart added a commit that referenced this issue May 23, 2017
In the old remote state system we had the idea of a local backup, which
is actually still present for the legacy backends but no longer applies
for the new-style backends like the s3 backend.

It's problematic when an apply runs for long enough that someone's
time-limited AWS STS credentials expire and then Terraform fails and can't
persist state to S3.

To reduce the risk of lost state, here we add some extra fallback code
for the local apply operation in particular. If either state writing
or state persisting fail then we attempt to write the state to a special
backup file errored.tfstate, and produce an error message that guides the
user on how to retry uploading this state.

In the unlikely event that we can't write to local disk either (e.g.
permissions problems) we take a last-ditch attempt to dump the JSON onto
stdout and advise the user to manually copy it into a file for import.
If even that doesn't work for some reason, we assume a critical Terraform
bug (JSON-serialization problem with states?) and bail out with an
apologetic error message.

This is implemented for the apply command in particular because this is
the one command where new objects are created in real APIs that we don't
want to lose track of. For other operations it's less bad to just generate
a simple error message and have the user retry.

This fixes #14298.
@ghost
Copy link

ghost commented Apr 12, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants