Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split aws_emr_cluster Resource instance_group Configuration Block Handling #8245

Closed
bflad opened this issue Apr 8, 2019 · 3 comments · Fixed by #8459
Closed

Split aws_emr_cluster Resource instance_group Configuration Block Handling #8245

bflad opened this issue Apr 8, 2019 · 3 comments · Fixed by #8459
Labels
service/emr Issues and PRs that pertain to the emr service. technical-debt Addresses areas of the codebase that need refactoring or redesign.
Milestone

Comments

@bflad
Copy link
Contributor

bflad commented Apr 8, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Background

When creating and managing Amazon EMR clusters, there are three node types to consider:

Each of these node types has unique management requirements. For example, only one master node may be created and almost all changes to it require cluster replacement. Core nodes can have only one configuration with some additional flexibility but some changes still require cluster replacement. Task nodes are completely optional and changes do not require cluster replacement.

For each of the node types, there are two available configuration options:

Operator selection of instance fleet or instance group depends on use case. For example, instance fleet can be used to automatically provision from a list of instance types while instance group is a static instance type.

Historically, the Terraform AWS Provider aws_emr_cluster resource has implemented a single, monolithic instance_group configuration block attribute which aligns directly with the InstanceGroups API parameter of the JobFlowInstancesConfig API type. Instance fleet management in the aws_emr_cluster resource has not been implemented. The aws_emr_instance_group resource is available to manage task nodes outside the aws_emr_cluster resource.

While the monolithic instance_group implementation is designed to offer most API options (e.g. EBS configuration) generically across all three instance group types, the flexibility comes at a price for the Terraform resource implementation and usability for Terraform operators:

  • Limiting master and core node type to a single configuration is done via code instead of schema. Future improvements to allow instance fleet usage in a cluster will require additional code to also prevent duplicate node type declarations across using instance fleets and instance groups in the same resource.
  • Limitations for master and core node types (if captured at all) is done via code instead of code. For example, master instance group configurations in Terraform are not prevented from specifying an autoscaling policy, which is only caught by API error during apply.
  • The instance_group configuration block uses a TypeSet schema attribute type along with a nested ebs_config configuration block. This type of schema is very complex to manage updates successfully and historically has presented operators with difficult to read/understand differences. Ignoring changes within the instance_group configuration is difficult as with TypeSet attributes in general in Terraform. There are many configuration updates which currently present as recreating the cluster unnecessarily that are not easy to fix.

Due to the above complexity, contributors and maintainers are hesitant to change the implementation to fix bugs or add enhancements.

Proposal

Split the monolithic instance_group configuration block into specific node type configuration blocks:

  • master_instance_group
  • core_instance_group
  • task_instance_group (To be determined in this proposal: these would still require complex update code due to the nested ebs_config configuration block. It may be best to require aws_emr_instance_group usage.)

Each of these will only contain relevant arguments and properly present cluster recreation. When instance fleet support is added to the resource, the schema can be augmented with ConflictsWith instead of additional code.

The existing instance_group attribute will be deprecated via the Terraform Deprecation Best Practices section on provider attribute removal. The top level legacy master_instance_type, core_instance_count, and core_instance_type arguments should also be deprecated and removed.

A full writeup of the migration process should be documented in a AWS Provider Version 3 Upgrade Guide, similar to the Version 2 Upgrade Guide.

master_instance_group

# Potential Terraform configuration, details may change during implementation
resource "aws_emr_cluster" "example" {
  # ... other configuration ...

  master_instance_group { # Type: TypeList, MaxItems: 1, ConflictsWith: instance_group, master_instance_type
    bid_price     = "" # Optional: true, ForceNew: true
    instance_type = "" # Required: true, ForceNew: true
    name          = "" # Optional: true, ForceNew: true

    ebs_config { ... } # matching existing schema, ForceNew: true for configuration block and nested arguments
  }
}

core_instance_group

# Potential Terraform configuration, details may change during implementation
resource "aws_emr_cluster" "example" {
  # ... other configuration ...

  core_instance_group { # Type: TypeList, MaxItems: 1, ConflictsWith: core_instance_count, core_instance_type, instance_group
    autoscaling_policy = "" # Optional: true
    bid_price          = "" # Optional: true, ForceNew: true
    instance_count     = 0 # Optional: true, Computed: true
    instance_type      = "" # Required: true, ForceNew: true
    name               = "" # Optional: true, ForceNew: true

    ebs_config { ... } # matching existing schema, ForceNew: true for configuration block and nested arguments
  }
}

References

@bflad bflad added proposal Proposes new design or functionality. service/emr Issues and PRs that pertain to the emr service. labels Apr 8, 2019
@bflad
Copy link
Contributor Author

bflad commented Apr 9, 2019

Removing the proposal label as the maintainers have discussed this and seems like a good path forward.

@bflad bflad added technical-debt Addresses areas of the codebase that need refactoring or redesign. and removed proposal Proposes new design or functionality. labels Apr 9, 2019
@bflad bflad changed the title Proposal: Split aws_emr_cluster Resource instance_group Configuration Block Handling Split aws_emr_cluster Resource instance_group Configuration Block Handling Apr 9, 2019
bflad added a commit that referenced this issue Apr 26, 2019
…_group configuration blocks, deprecate other instance group configuration methods

Reference: #8245

Output from acceptance testing:

```
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_AutoscalingPolicy (487.78s)
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_BidPrice (1006.42s)
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_InstanceCount (1004.90s)
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_InstanceType (869.47s)
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_Migration_CoreInstanceType (415.45s)
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_Migration_InstanceGroup (485.50s)
--- PASS: TestAccAWSEMRCluster_CoreInstanceGroup_Name (769.30s)
--- PASS: TestAccAWSEMRCluster_MasterInstanceGroup_BidPrice (849.98s)
--- PASS: TestAccAWSEMRCluster_MasterInstanceGroup_InstanceType (756.42s)
--- PASS: TestAccAWSEMRCluster_MasterInstanceGroup_Migration_InstanceGroup (414.36s)
--- PASS: TestAccAWSEMRCluster_MasterInstanceGroup_Migration_MasterInstanceType (423.02s)
--- PASS: TestAccAWSEMRCluster_MasterInstanceGroup_Name (735.58s)
```
@bflad bflad added this to the v2.11.0 milestone May 13, 2019
@bflad
Copy link
Contributor Author

bflad commented May 17, 2019

This has been released in version 2.11.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@ghost
Copy link

ghost commented Mar 30, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/emr Issues and PRs that pertain to the emr service. technical-debt Addresses areas of the codebase that need refactoring or redesign.
Projects
None yet
1 participant