Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBS Volume Attachement Destroy/Recreate state issue in 0.7.4 #9000

Closed
sepulworld opened this issue Sep 22, 2016 · 7 comments
Closed

EBS Volume Attachement Destroy/Recreate state issue in 0.7.4 #9000

sepulworld opened this issue Sep 22, 2016 · 7 comments

Comments

@sepulworld
Copy link

sepulworld commented Sep 22, 2016

Terraform Version

0.7.4

Affected Resource(s)

aws_volume_attachement

Terraform Configuration Files

resource "aws_instance" "kafka" {
  count             = "${var.kafka_node_count}"
  ami               = "${lookup(var.kafka_amis, count.index)}"
  instance_type     = "${var.kafka_instance_type}"
  iam_instance_profile = "${aws_iam_instance_profile.core_instance_profile.name}"
  vpc_security_group_ids = ["${aws_security_group.sg-bastion.id}"]
  source_dest_check = false
  subnet_id         = "${element(split(",", module.vpc.public_subnets), count.index)}"
  private_ip = "${lookup(var.kafka_private_ips, count.index)}"
  key_name          = "${var.ssh_pem}"
  depends_on        = ["aws_instance.zookeeper", "aws_instance.bastion", "aws_ebs_volume.kafka_ebs"]
  tags {
    Name = "kafka${count.index}-${var.service}-${var.environment}"
    Environment = "${var.environment}"
  }
  user_data         = "{\"consul_master\":\"consul-${var.environment}.${var.domain_name}\", \"role\": \"kafka\", \"domain\": \"${var.service}-${var.environment}\", \"cluster_name\": \"${var.service}-${var.environment}\"}"
  lifecycle {
    ignore_changes = ["user_data"]
  }
}

resource "aws_ebs_volume" "kafka_ebs" {
  count = "${var.kafka_node_count}"
  availability_zone = "${lookup(var.kafka_ebs_volume_zones, count.index)}"
  size = "${var.kafka_ebs_volume_size}"
  type = "${var.kafka_ebs_volume_type}"
}

# Can't count deploy aws_volume_attachement via count because of bug: # https://github.com/hashicorp/terraform/issues/3449

resource "aws_volume_attachment" "kafka_ebs_att_0" {
  device_name = "/dev/xvdz"
  volume_id = "${aws_ebs_volume.kafka_ebs.0.id}"
  instance_id = "${aws_instance.kafka.0.id}"
  provisioner "file" {
    source = "remote_scripts/setup_ebs.sh"
    destination = "/tmp/setup_ebs.sh"
    connection {
      host = "${aws_instance.kafka.0.public_ip}"
      user = "ubuntu"
      private_key = "${file("${var.ssh_pem_location}")}"
    }
  }
}

resource "aws_volume_attachment" "kafka_ebs_att_1" {
  device_name = "/dev/xvdz"
  volume_id = "${aws_ebs_volume.kafka_ebs.1.id}"
  instance_id = "${aws_instance.kafka.1.id}"
  provisioner "file" {
    source = "remote_scripts/setup_ebs.sh"
    destination = "/tmp/setup_ebs.sh"
    connection {
      host = "${aws_instance.kafka.1.public_ip}"
      user = "ubuntu"
      private_key = "${file("${var.ssh_pem_location}")}"
    }
  }
}

resource "aws_volume_attachment" "kafka_ebs_att_2" {
  device_name = "/dev/xvdz"
  volume_id = "${aws_ebs_volume.kafka_ebs.2.id}"
  instance_id = "${aws_instance.kafka.2.id}"
  provisioner "file" {
    source = "remote_scripts/setup_ebs.sh"
    destination = "/tmp/setup_ebs.sh"
    connection {
      host = "${aws_instance.kafka.2.public_ip}"
      user = "ubuntu"
      private_key = "${file("${var.ssh_pem_location}")}"
    }
  }
}

Expected Behavior

If the aws_instance.kafka.0 is destroyed through the AWS console, but the attached EBS volume aws_ebs_volume.kafka_ebs.0 is left intact. You would expect the next Terraform 'apply' to recreate the destroyed ec2 instances (aws_instance.kafka.0) and reattach the (aws_ebs_volume.kafka_ebs.0) by recreating aws_volume_attachment.kafka_ebs.0 resource.

Actual Behavior

If the aws_instance.kafka.0 is destroyed through the AWS console, but the attached EBS volume aws_ebs_volume.kafka_ebs.0 is left intact.
Subsequent Terraform apply will attempt to destroy and recreate the aws_volume_attachment.kafka_ebs_att_0 and error with:

  • aws_volume_attachment.kafka_ebs_att_0: Failed to detach Volume (vol-fdda0075) from Instance (i-04027ff7bb5614c41): IncorrectState: Volume 'vol-fdda0075'is in the 'available' state.

This is new behavior after upgrading from 0.7.2 to 0.7.4

@hgontijo
Copy link
Contributor

hgontijo commented Sep 26, 2016

@sepulworld Since you manually removed an aws_instance via AWS console, have you tried reconciling the terraform state, i.e. terraform state rm aws_volume_attachment.<id> before running terraform apply?

@kishorenc
Copy link

This issue is unfortunate, especially given #2957. We re-assign EBS volumes when we rotate instances. That's quite a common pattern. Because of this issue we've to do a terraform state rm to flush Terraform's "memory" before I can re-attach a volume to a new instance.

@stack72
Copy link
Contributor

stack72 commented Nov 2, 2016

Hi folks

I believe that PR #9792 takes care of this issue - it will allow us to skip the detachment of an EBS volume and let the instance take care of it

Please let me know your thoughts on this

Paul

@clstokes
Copy link
Contributor

clstokes commented Jan 22, 2017

@stack72 I think skip_destroy would work, but when doing an apply when an instance has been destroyed or tainted, Terraform errors out seemingly before the skip_destroy logic is reached.

To reproduce:

  1. terraform apply the config linked below
  2. terraform taint aws_instance.main.1
  3. terraform apply

Error:

...
aws_volume_attachment.data_att.1: Still creating... (10s elapsed)
aws_volume_attachment.data_att.1: Creation complete
Error applying plan:

2 error(s) occurred:

* aws_volume_attachment.data_att.0: [WARN] Error attaching volume (vol-05be5996fc82db4e3) to instance (i-06fa8407fee424c26), message: "vol-05be5996fc82db4e3 is already attached to an instance", code: "VolumeInUse"
* aws_volume_attachment.data_att.2: [WARN] Error attaching volume (vol-0ac2fcd156504448c) to instance (i-0a7641c1c933cf907), message: "vol-0ac2fcd156504448c is already attached to an instance", code: "VolumeInUse"
...

Full config and log is at https://gist.github.com/clstokes/06487cb02dea5e46b538bbcfc4007dea.

This is with Terraform v0.8.4.

Updated to add Terraform version.

@stack72
Copy link
Contributor

stack72 commented Feb 1, 2017

Hi all

this has been fixed by #11060

I added all the debug info there to show this was the case. I also talked through with @clstokes what was happening

This will be part of terraform 0.8.6

Paul

@stack72 stack72 closed this as completed Feb 1, 2017
@cl0udgeek
Copy link

cl0udgeek commented Jun 26, 2017

Not sure if this is another bug or I'm just doing it wrong...but for some reason, Terraform wants to destroy all of my ebs volumes even if jus taint one node....

Config:

resource "aws_instance" "influxdata" {
  count         = "${var.ec2-count-influx-data}"
  ami           = "${module.amis.rhel73_id}"
  instance_type = "${var.ec2-type-influx-data}"

  vpc_security_group_ids = ["${var.sg-ids}"]
  subnet_id              = "${element(module.infra.subnet,count.index)}"
  key_name               = "${var.KeyName}"

  tags {
    Name               = "influx-data-node-0${count.index}"
    ASV                = "${module.infra.ASV}"
    CMDBEnvironment    = "${module.infra.CMDBEnvironment}"
    OwnerContact       = "${module.infra.OwnerContact}"
    custodian_downtime = "off"
    OwnerEid           = "${var.OwnerEid}"
  }

  connection {
    private_key = "${file("/Users/influx_east.pem")}" #qa env east
    user        = "ec2-user"
  }

  provisioner "remote-exec" {
    inline = ["echo just checking for ssh. ttyl. bye."]
  }

  provisioner "remote-exec" {
    when   = "destroy"
    inline = [
      "sudo service influx-data stop",
      "sudo unmount /dev/xvdg"
    ]
    connection {
      user = "ec2-user"
      host = "${self.private_ip}"
      private_key = "${file("/Users/influx_east.pem")}"
    }
  }
}

resource "aws_ebs_volume" "influxdata_ebs" {
  count             = "${var.ec2-count-influx-meta}"
  availability_zone = "${element(var.cds-qa-east,count.index)}"
  size = "1024"
  type = "io1"
  iops = 3000
  encrypted = true
  tags {
    Name               = "influxdata-ebs-0${count.index}"
    Sys                = "${module.infra.ASV}"
    OwnerContact       = "${module.infra.OwnerContact}"
    Owner           = "${var.OwnerEid}"
  }
}

resource "aws_volume_attachment" "influx_ebs_att" {
  count = 3
  device_name = "/dev/xvdg"
  volume_id = "${element(aws_ebs_volume.influxdata_ebs.*.id, count.index)}"
  instance_id = "${element(aws_instance.influxdata.*.id, count.index)}"
}

I run a terraform taint aws_instance.influxdata.0 to just rebuild one instance.

Expected Behavior:

Terraform should stop the service, unmount volume, detach volume, recreate instance, reattach volume

Actual Behavior:

Terraform trys to destroy all EBS volumes and get this error:


3 error(s) occurred:

* aws_volume_attachment.influx_ebs_att[1] (destroy): 1 error(s) occurred:

* aws_volume_attachment.influx_ebs_att.1: Error waiting for Volume (vol-04c306280e9b6c953) to detach from Instance: i-003a4db9ccfb4af68
* aws_volume_attachment.influx_ebs_att[0] (destroy): 1 error(s) occurred:

* aws_volume_attachment.influx_ebs_att.0: Error waiting for Volume (vol-054725609c55a35d6) to detach from Instance: i-078c714d85eb77afe
* aws_volume_attachment.influx_ebs_att[2] (destroy): 1 error(s) occurred:

* aws_volume_attachment.influx_ebs_att.2: Error waiting for Volume (vol-0ccce3d93122eb233) to detach from Instance: i-0c380a9cae915d8a3

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

I'm using version 0.9.8 as well...

here is the plan too...

"apply" is called, Terraform can't guarantee this is what will execute.

~ aws_elb.influxdata-elb
    instances.#: "" => "<computed>"

-/+ aws_instance.influxdata.0 (tainted)
    ami:                               "ami-916c4387" => "ami-916c4387"
    associate_public_ip_address:       "false" => "<computed>"
    availability_zone:                 "us-east-1a" => "<computed>"
    ebs_block_device.#:                "1" => "<computed>"
    ephemeral_block_device.#:          "0" => "<computed>"
    instance_state:                    "running" => "<computed>"
    instance_type:                     "r4.2xlarge" => "r4.2xlarge"
    ipv6_address_count:                "" => "<computed>"
    ipv6_addresses.#:                  "0" => "<computed>"
    key_name:                          "influx_east" => "influx_east"
    network_interface.#:               "0" => "<computed>"
    network_interface_id:              "eni-dae72b76" => "<computed>"
    placement_group:                   "" => "<computed>"
    primary_network_interface_id:      "eni-dae72b76" => "<computed>"
    private_dns:                       "ip-10-47-0-18.da.co.com" => "<computed>"
    private_ip:                        "10.47.0.18" => "<computed>"
    public_dns:                        "" => "<computed>"
    public_ip:                         "" => "<computed>"
    root_block_device.#:               "1" => "<computed>"
    security_groups.#:                 "0" => "<computed>"
    source_dest_check:                 "true" => "true"
    subnet_id:                         "subnet-875fadab" => "subnet-875fadab"
    tags.%:                            "6" => "6"
    tags.CMDBEnvironment:              "STREAMDATAPLATFORMAWS" => "ENVNPSTREAMDATAPLATFORMAWS"
    tags.Name:                         "influx-data-node-00" => "influx-data-node-00"
    tags.OwnerEid:                     "943" => "943"
    tenancy:                           "default" => "<computed>"
    volume_tags.%:                     "7" => "<computed>"
    vpc_security_group_ids.#:          "3" => "3"
    vpc_security_group_ids.1889494443: "sg-86a071f7" => "sg-86a071f7"
    vpc_security_group_ids.528573618:  "sg-07a57476" => "sg-07a57476"
    vpc_security_group_ids.787016340:  "sg-b5fd02c4" => "sg-b5fd02c4"

-/+ aws_volume_attachment.influx_ebs_att.0
    device_name:  "/dev/xvdg" => "/dev/xvdg"
    force_detach: "" => "<computed>"
    instance_id:  "i-078c714d85eb77afe" => "${element(aws_instance.influxdata.*.id, count.index)}" (forces new resource)
    skip_destroy: "" => "<computed>"
    volume_id:    "vol-054725609c55a35d6" => "vol-054725609c55a35d6"

-/+ aws_volume_attachment.influx_ebs_att.1
    device_name:  "/dev/xvdg" => "/dev/xvdg"
    force_detach: "" => "<computed>"
    instance_id:  "i-003a4db9ccfb4af68" => "${element(aws_instance.influxdata.*.id, count.index)}" (forces new resource)
    skip_destroy: "" => "<computed>"
    volume_id:    "vol-04c306280e9b6c953" => "vol-04c306280e9b6c953"

-/+ aws_volume_attachment.influx_ebs_att.2
    device_name:  "/dev/xvdg" => "/dev/xvdg"
    force_detach: "" => "<computed>"
    instance_id:  "i-0c380a9cae915d8a3" => "${element(aws_instance.influxdata.*.id, count.index)}" (forces new resource)
    skip_destroy: "" => "<computed>"
    volume_id:    "vol-0ccce3d93122eb233" => "vol-0ccce3d93122eb233"

+ local_file.inventory-meta
    content:  "[meta]\n${join(\"\\n\",aws_instance.influxmeta.*.private_ip)}\n\n[data]\n${join(\"\\n\",aws_instance.influxdata.*.private_ip)}\n"
    filename: "inventory"

@ghost
Copy link

ghost commented Apr 8, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants