[EMR] ValidationException: An instance group may only be modified when the cluster is running or waiting #9400

ghost · 2019-07-18T10:20:18Z

This issue was originally opened by @hdryx as hashicorp/terraform#22116. It was migrated here as a result of the provider split. The original body of the issue is below.

Terraform Version

v0.12.3

Terraform Configuration Files

resource "aws_emr_cluster" "cluster" {
  name          = "${var.project}-${var.cluster_name}-${var.environment}"
  release_label = "${var.emr_release_label}"
  applications  = "${var.emr_application}"

  ec2_attributes {
    subnet_id                         = "${data.aws_subnet.private_1.id}"
    emr_managed_master_security_group = "${data.aws_security_group.sg_emr_master.id}"
    emr_managed_slave_security_group  = "${data.aws_security_group.sg_emr_slave.id}"
    instance_profile                  = "${data.aws_iam_instance_profile.iam_emr_instance_profile.arn}"


    # Because we are launching the EMR in a private subnet we should use a service_access_security_group
    # service_access_security_group = "${aws_security_group.emr_service_access.id}"
    service_access_security_group = "${data.aws_security_group.sg_emr_service_access.id}"
  }
  master_instance_group {
    instance_type = "${var.master_instance_type}"
    bid_price      = "${var.spot_core_bid_price}"
    
    ebs_config {
      size                 = "${var.ebs_config_size}"
      type                 = "${var.ebs_config_type}"
      volumes_per_instance = "${var.ebs_config_volume_per_instance}"
    }
  }


  core_instance_group {
    instance_type  = "${var.core_instance_type}"
    instance_count = "${var.core_instance_count}"
    bid_price      = "${var.spot_core_bid_price}"

    ebs_config {
      size                 = "${var.ebs_config_size}"
      type                 = "${var.ebs_config_type}"
      volumes_per_instance = "${var.ebs_config_volume_per_instance}"
    }
  }

  ebs_root_volume_size = "${var.ebs_root_volume_size}"

  tags = "${merge(var.resource_tagging, map("Name", "${var.project}-${var.environment}-emr-cluster"))}"

  lifecycle {
    create_before_destroy = true
  }

  log_uri = "s3://${data.aws_s3_bucket.logs.id}/emr"

  # Terminate cluster when steps are done
  keep_job_flow_alive_when_no_steps = "${var.keep_job_no_steps}"

  bootstrap_action {
    path = "s3://${data.aws_s3_bucket.sources.id}/src/shell/${var.bootstrap_file}"
    name = "${var.bootstrap_name}"
    args = "${var.bootstrap_args}"
  }

  # Configuration of the cluster
  configurations_json = "${var.configuration_file != "" ? file("config/${var.configuration_file}") : ""}"

  # Role for the cluster
  service_role = "${data.aws_iam_role.iam_emr_service_role.arn}"


  # Steps to be executed by the cluster
  dynamic "step" {
    # for_each = jsondecode(templatefile("${path.module}/steps/${var.step_file}", {
    for_each = jsondecode(templatefile("../03_emr/steps/${var.step_file}", {
      # General Variables
      s3_sources      = "${data.aws_s3_bucket.sources.id}"
    }))

    content {
      action_on_failure = step.value.action_on_failure
      name              = step.value.name
      hadoop_jar_step {
        jar  = step.value.hadoop_jar_step.jar
        args = step.value.hadoop_jar_step.args
      }
    }
  }

}


resource "aws_emr_instance_group" "task" {
  name           = "${var.project}-${var.cluster_name}-instance-${var.environment}"
  cluster_id     = "${aws_emr_cluster.cluster.id}"
  instance_count = "${var.spot_core_instance_count}"
  instance_type  = "${var.spot_core_instance_type}"
  bid_price      = "${var.spot_core_bid_price}"

  depends_on = ["aws_emr_cluster.cluster"]
}

...

Crash Output

Error: error draining EMR Instance Group (ig-24CP9QNA1THDI): ValidationException: An instance group may only be modified when the cluster is running or waiting.
status code: 400, request id: f6b923cc-a935-11e9-97e8-993b41767f35

Expected Behavior

I expected that the Task Group should be created and added to the EMR

Actual Behavior

EMR is launched but the Task Group is not created (see error above)

klsnreddy · 2019-09-06T02:31:49Z

I am also facing this same issue, do we have any work around?

rlvrs · 2019-09-15T22:09:29Z

Terraform Version: 0.11.14
Terraform AWS provider Version: 2.25

I have the same problem. Whilst performing my investigation, I will update this answer with more details.
I found two workarounds as of now, but I don't like any of them. I am counting on you to help me reach a neater solution :)
On the other hand, they might help you better understand the problem.

If you need more details than below, please let me know.

In our scenario, we use an EMR cluster with instance groups. This cluster creation is defined in a Terraform module so we try to be as DRY as possible.
Each time we want to run a Spark job, it will create an EMR cluster for itself, run and die (in the case of a batch job).

After the job finishes, the EMR cluster itself will terminate as well.
At this point, the terraform state file will have declared some aws_emr_instance_group resource, say ig-24CP9QNA1THDI. However, as you can see here, the aws_emr_instance_group should be deleted after termintation:

Instance Groups are destroyed when the EMR Cluster is destroyed

The next plan of Terraform will say that a new resource is required (-/+ module.emr_cluster_module.aws_emr_instance_group.task (new resource required)), because the instance group is in the state file. However, it will fail to apply with the above error:

Error: error draining EMR Instance Group (ig-24CP9QNA1THDI): ValidationException: An instance group may only be modified when the cluster is running or waiting.
status code: 400, request id: f6b923cc-a935-11e9-97e8-993b41767f35

This happens because the cluster is obviously terminated, thus it is neither in "running" nor in "waiting" state. I presume that this error is due to the new resource aws_emr_instance_group in the state file that was not there in the previous syntax (but this is just a hunch).
However, I was expecting it to create a new aws_emr_instance_group for the tasks, if the previous one is destroyed already as opposed to throwing this error.

The following workarounds work for me, but as previous stated, I am looking for a better solution.
Workaround 1: Fallback to the previous syntax and rollback the changes here. This should not be advised, since the previous syntax will be deprecated as seen here and here.
Workaround 2: Remove the instance group directly from the state file. This is even worse, but I put it here for debugging purposes.

This feature was introduced here and here. I just skim-through the code, but I don't see this behavior being tested (I must say that it was my first time looking at this code).

For more information on the issue:
https://www.terraform.io/docs/providers/aws/guides/version-3-upgrade.html
#8245
https://github.com/terraform-providers/terraform-provider-aws/pull/8459/files
https://github.com/terraform-providers/terraform-provider-aws/pull/8078/files

joelthompson · 2019-10-07T20:19:30Z

I think this is a duplicate of #1355

In situations where Terraform needs to replace an aws_emr_cluster resource that has aws_emr_instance_group resources associated with it, Terraform tries to execute a destroy on the instance group, but it fails as the notion of a "destroy" on an instance group is to set the number of instances to zero, but AWS doesn't let you modify the count of instances in an instance group on an EMR cluster. This fixes the issue by treating an instance group that has been terminated as no longer existing, so Terraform won't try to execute a "destroy" and not error out. Fixes #1355 Fixes #9400

bflad · 2019-10-10T06:33:22Z

The fix for this has been merged and will release with version 2.32.0 of the Terraform AWS Provider, later today. Thanks to @joelthompson for the implementation. 🎉

ghost · 2019-10-10T20:46:49Z

This has been released in version 2.32.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

ghost · 2019-11-09T14:50:51Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

ghost mentioned this issue Jul 18, 2019

[EMR] ValidationException: An instance group may only be modified when the cluster is running or waiting hashicorp/terraform#22116

Closed

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Jul 18, 2019

anders-s mentioned this issue Aug 1, 2019

Diffs didn't match during apply hashicorp/terraform#22293

Closed

aeschright added the service/emr Issues and PRs that pertain to the emr service. label Aug 2, 2019

joelthompson mentioned this issue Oct 8, 2019

Fix AWS EMR Instance Group Deletion Errors #10425

Merged

bflad added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Oct 10, 2019

bflad added this to the v2.32.0 milestone Oct 10, 2019

bflad closed this as completed in #10425 Oct 10, 2019

ghost locked and limited conversation to collaborators Nov 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EMR] ValidationException: An instance group may only be modified when the cluster is running or waiting #9400

[EMR] ValidationException: An instance group may only be modified when the cluster is running or waiting #9400

ghost commented Jul 18, 2019

klsnreddy commented Sep 6, 2019

rlvrs commented Sep 15, 2019 •

edited

Loading

joelthompson commented Oct 7, 2019

bflad commented Oct 10, 2019

ghost commented Oct 10, 2019

ghost commented Nov 9, 2019

[EMR] ValidationException: An instance group may only be modified when the cluster is running or waiting #9400

[EMR] ValidationException: An instance group may only be modified when the cluster is running or waiting #9400

Comments

ghost commented Jul 18, 2019

Terraform Version

Terraform Configuration Files

Crash Output

Expected Behavior

Actual Behavior

klsnreddy commented Sep 6, 2019

rlvrs commented Sep 15, 2019 • edited Loading

joelthompson commented Oct 7, 2019

bflad commented Oct 10, 2019

ghost commented Oct 10, 2019

ghost commented Nov 9, 2019

rlvrs commented Sep 15, 2019 •

edited

Loading