Skip to content

hansohn/terraform-aws-emr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

terraform-aws-emr

Terraform module to provision an EMR Cluster

đź“– Usage

Welcome to the terraform-aws-emr repo!

:octocat: Examples

Please see the sample set of examples below for a better understanding of implementation

Requirements

Name Version
terraform >= 1.0.0
aws >= 4.0

Providers

Name Version
aws >= 4.0

Modules

Name Source Version
emr_cluster ./modules/emr-cluster n/a
emr_security_configuration ./modules/emr-security-configuration n/a
label_cloudwatch_rule cloudposse/label/null 0.25.0
label_cloudwatch_target cloudposse/label/null 0.25.0
label_core cloudposse/label/null 0.25.0
label_master cloudposse/label/null 0.25.0
label_master_managed cloudposse/label/null 0.25.0
label_notebook_instance cloudposse/label/null 0.25.0
label_notebook_master cloudposse/label/null 0.25.0
label_service_managed cloudposse/label/null 0.25.0
label_slave cloudposse/label/null 0.25.0
label_slave_managed cloudposse/label/null 0.25.0
this cloudposse/label/null 0.25.0

Resources

Name Type
aws_s3_object.emr_bootstrap_script resource
aws_security_group.managed_master resource
aws_security_group.managed_service_access resource
aws_security_group.managed_slave resource
aws_security_group.master resource
aws_security_group.notebook_instance resource
aws_security_group.notebook_master resource
aws_security_group.slave resource
aws_security_group_rule.managed_master_egress resource
aws_security_group_rule.managed_master_ingress_managed_service resource
aws_security_group_rule.managed_master_ingress_managed_slave resource
aws_security_group_rule.managed_master_ingress_self resource
aws_security_group_rule.managed_service_access_egress_managed_master resource
aws_security_group_rule.managed_service_access_egress_managed_slave resource
aws_security_group_rule.managed_service_access_ingress resource
aws_security_group_rule.managed_slave_egress_master resource
aws_security_group_rule.managed_slave_egress_s3 resource
aws_security_group_rule.managed_slave_egress_self resource
aws_security_group_rule.managed_slave_egress_service resource
aws_security_group_rule.managed_slave_ingress_managed_master resource
aws_security_group_rule.managed_slave_ingress_managed_service resource
aws_security_group_rule.managed_slave_ingress_self resource
aws_security_group_rule.master_egress_cidr_blocks resource
aws_security_group_rule.master_egress_security_groups resource
aws_security_group_rule.master_ingress_cidr_blocks resource
aws_security_group_rule.master_ingress_security_groups resource
aws_security_group_rule.notebook_instance_egress_https resource
aws_security_group_rule.notebook_instance_egress_livy resource
aws_security_group_rule.notebook_master_ingress_livy resource
aws_security_group_rule.slave_egress_cidr_blocks resource
aws_security_group_rule.slave_egress_security_groups resource
aws_security_group_rule.slave_ingress_cidr_blocks resource
aws_security_group_rule.slave_ingress_security_groups resource
aws_caller_identity.current data source
aws_prefix_list.s3 data source
aws_region.current data source

Inputs

Name Description Type Default Required
additional_info (Optional) A JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore Terraform cannot detect drift from the actual EMR cluster if its value is changed outside Terraform. string null no
additional_tag_map Additional key-value pairs to add to each map in tags_as_list_of_maps. Not added to tags or id.
This is for some rare cases where resources want additional configuration of tags
and therefore take a list of maps with tag key, value, and additional configuration.
map(string) {} no
applications (Optional) A list of applications for the cluster. set(string) null no
attributes ID element. Additional attributes (e.g. workers or cluster) to add to id,
in the order they appear in the list. New attributes are appended to the
end of the list. The elements of the list are joined by the delimiter
and treated as a single ID element.
list(string) [] no
autoscaling_role (Optional) An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group string null no
bootstrap_action (Optional) Ordered list of bootstrap actions that will be run before Hadoop is started on the cluster nodes.
list(object({
name = string
path = string
args = list(string)
}))
[] no
bootstrap_s3_bucket (Required) The name of the bucket to put the file in. Alternatively, an S3 access point ARN can be specified. string null no
bootstrap_s3_key (Required) The name of the object once it is in the bucket. string null no
bootstrap_s3_kms_key_id (Optional) Specifies the AWS KMS Key ARN to use for object encryption. This value is a fully qualified ARN of the KMS Key. string null no
bootstrap_s3_server_side_encryption (Optional) Specifies server-side encryption of the object in S3. Valid values are 'AES256' and 'aws:kms'. string "AES256" no
cluster_name The name of the job flow string null no
configurations (Optional) List of configurations supplied for the EMR cluster you are creating string null no
configurations_json (Optional) A JSON string for supplying list of configurations for the EMR cluster. string null no
context Single object for setting entire context at once.
See description of individual variables for details.
Leave string and numeric variables as null to use default value.
Individual variable settings (non-null) override settings in context object,
except for attributes, tags, and additional_tag_map, which are merged.
any
{
"additional_tag_map": {},
"attributes": [],
"delimiter": null,
"descriptor_formats": {},
"enabled": true,
"environment": null,
"id_length_limit": null,
"label_key_case": null,
"label_order": [],
"label_value_case": null,
"labels_as_tags": [
"unset"
],
"name": null,
"namespace": null,
"regex_replace_chars": null,
"stage": null,
"tags": {},
"tenant": null
}
no
core_instance_autoscaling_max_capacity (Required) The max capacity of the scalable target. number 1 no
core_instance_autoscaling_min_capacity (Required) The min capacity of the scalable target. number 1 no
core_instance_group_autoscaling_policy (Optional) String containing the EMR Auto Scaling Policy JSON. string null no
core_instance_group_bid_price (Optional) Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances. string null no
core_instance_group_ebs_config_iops (Optional) The number of I/O operations per second (IOPS) that the volume supports number null no
core_instance_group_ebs_config_size (Required) The volume size, in gibibytes (GiB). number null no
core_instance_group_ebs_config_type (Required) The volume type. Valid options are gp2, io1, standard and st1. string null no
core_instance_group_ebs_config_volumes_per_instance (Optional) The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1) number 1 no
core_instance_group_instance_count (Optional) Target number of instances for the instance group. Must be at least 1. Defaults to 1. number 1 no
core_instance_group_instance_type (Required) EC2 instance type for all instances in the instance group. string null no
custom_ami_id (Optional) A custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). string null no
delimiter Delimiter to be used between ID elements.
Defaults to - (hyphen). Set to "" to use no delimiter at all.
string null no
descriptor_formats Describe additional descriptors to be output in the descriptors output map.
Map of maps. Keys are names of descriptors. Values are maps of the form
{<br> format = string<br> labels = list(string)<br>}
(Type is any so the map values can later be enhanced to provide additional options.)
format is a Terraform format string to be passed to the format() function.
labels is a list of labels, in order, to pass to format() function.
Label values will be normalized before being passed to format() so they will be
identical to how they appear in id.
Default is {} (descriptors output will be empty).
any {} no
ebs_root_volume_size (Optional) Size in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and later. number null no
ec2_attributes_additional_master_security_groups (Optional) String containing a comma separated list of additional Amazon EC2 security group IDs for the master node string null no
ec2_attributes_additional_slave_security_groups (Optional) String containing a comma separated list of additional Amazon EC2 security group IDs for the slave nodes as a comma separated string string null no
ec2_attributes_emr_managed_master_security_group (Optional) Identifier of the Amazon EC2 EMR-Managed security group for the master node string null no
ec2_attributes_emr_managed_slave_security_group (Optional) Identifier of the Amazon EC2 EMR-Managed security group for the slave nodes string null no
ec2_attributes_instance_profile (Required) Instance Profile for EC2 instances of the cluster assume this role string null no
ec2_attributes_key_name (Optional) Amazon EC2 key pair that can be used to ssh to the master node as the user called hadoop string null no
ec2_attributes_service_access_security_group (Optional) Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet string null no
ec2_attributes_subnet_id (Optional) VPC subnet id where you want the job flow to launch. Cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a Amazon VPC string null no
enabled Set to false to prevent the module from creating any resources bool null no
environment ID element. Usually used for region e.g. 'uw2', 'us-west-2', OR role 'prod', 'staging', 'dev', 'UAT' string null no
id_length_limit Limit id to this many characters (minimum 6).
Set to 0 for unlimited length.
Set to null for keep the existing setting, which defaults to 0.
Does not affect id_full.
number null no
keep_job_flow_alive_when_no_steps (Optional) Switch on/off run cluster with no steps or when all steps are complete (default is on) bool true no
kerberos_attributes_ad_domain_join_password (Optional) The Active Directory password for ad_domain_join_user. Terraform cannot perform drift detection of this configuration string null no
kerberos_attributes_ad_domain_join_user (Optional) Required only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. Terraform cannot perform drift detection of this configuration. string null no
kerberos_attributes_cross_realm_trust_principal_password (Optional) Required only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. Terraform cannot perform drift detection of this configuration. string null no
kerberos_attributes_kdc_admin_password (Required) The password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. Terraform cannot perform drift detection of this configuration. string null no
kerberos_attributes_realm (Required) The name of the Kerberos realm to which all nodes in a cluster belong. For example, EC2.INTERNAL string null no
label_key_case Controls the letter case of the tags keys (label names) for tags generated by this module.
Does not affect keys of tags passed in via the tags input.
Possible values: lower, title, upper.
Default value: title.
string null no
label_order The order in which the labels (ID elements) appear in the id.
Defaults to ["namespace", "environment", "stage", "name", "attributes"].
You can omit any of the 6 labels ("tenant" is the 6th), but at least one must be present.
list(string) null no
label_value_case Controls the letter case of ID elements (labels) as included in id,
set as tag values, and output by this module individually.
Does not affect values of tags passed in via the tags input.
Possible values: lower, title, upper and none (no transformation).
Set this to title and set delimiter to "" to yield Pascal Case IDs.
Default value: lower.
string null no
labels_as_tags Set of labels (ID elements) to include as tags in the tags output.
Default is to include all labels.
Tags with empty values will not be included in the tags output.
Set to [] to suppress all generated tags.
Notes:
The value of the name tag, if included, will be the id, not the name.
Unlike other null-label inputs, the initial setting of labels_as_tags cannot be
changed in later chained modules. Attempts to change it will be silently ignored.
set(string)
[
"default"
]
no
log_uri (Optional) S3 bucket to write the log files of the job flow. If a value is not provided, logs are not created string null no
master_allowed_cidr_blocks List of CIDR blocks to be allowed to access the master instances list(string) [] no
master_allowed_security_groups List of security groups to be allowed to connect to the master instances list(string) [] no
master_instance_group_bid_price (Optional) Bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances string null no
master_instance_group_ebs_config_iops (Optional) The number of I/O operations per second (IOPS) that the volume supports number null no
master_instance_group_ebs_config_size (Required) The volume size, in gibibytes (GiB). number null no
master_instance_group_ebs_config_type (Required) The volume type. Valid options are gp2, io1, standard and st1. string null no
master_instance_group_ebs_config_volumes_per_instance (Optional) The number of EBS volumes with this configuration to attach to each EC2 instance in the instance group (default is 1) number 1 no
master_instance_group_instance_count (Optional) Target number of instances for the instance group. Must be 1 or 3. Defaults to 1. Launching with multiple master nodes is only supported in EMR version 5.23.0+, and requires this resource's core_instance_group to be configured. Public (Internet accessible) instances must be created in VPC subnets that have map public IP on launch enabled. Termination protection is automatically enabled when launched with multiple master nodes and Terraform must have the termination_protection = false configuration applied before destroying this resource. number null no
master_instance_group_instance_type (Required) EC2 instance type for all instances in the instance group. string null no
name ID element. Usually the component or solution name, e.g. 'app' or 'jenkins'.
This is the only ID element not also included as a tag.
The "name" tag is set to the full id string. There is no tag with the value of the name input.
string null no
namespace ID element. Usually an abbreviation of your organization name, e.g. 'eg' or 'cp', to help ensure generated IDs are globally unique string null no
regex_replace_chars Terraform regular expression (regex) string.
Characters matching the regex will be removed from the ID elements.
If not set, "/[^a-zA-Z0-9-]/" is used to remove all characters other than hyphens, letters and digits.
string null no
release_label (Required) The release label for the Amazon EMR release string n/a yes
scale_down_behavior (Optional) The way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized. string null no
security_configuration (Optional) The security configuration name to attach to the EMR cluster. Only valid for EMR clusters with release_label 4.8.0 or greater string null no
service_role (Required) IAM role that will be assumed by the Amazon EMR service to access AWS resources string null no
slave_allowed_cidr_blocks List of CIDR blocks to be allowed to access the slave instances list(string) [] no
slave_allowed_security_groups List of security groups to be allowed to connect to the slave instances list(string) [] no
sns_topic_arn (Required) The Amazon Resource Name (ARN) associated of the SNS target. string null no
stage ID element. Usually used to indicate role, e.g. 'prod', 'staging', 'source', 'build', 'test', 'deploy', 'release' string null no
step (Optional) List of steps to run when creating the cluster. list(any) [] no
step_concurrency_level (Optional) The number of steps that can be executed concurrently. You can specify a maximum of 256 steps. number 1 no
subnet_type The type of subnet the EMR cluster is provisioned in. Used to determine if service related security groups are required. Defaults to 'private' string "private" no
tags Additional tags (e.g. {'BusinessUnit': 'XYZ'}).
Neither the tag keys nor the tag values will be modified by this module.
map(string) {} no
tenant ID element _(Rarely used, not included by default)_. A customer identifier, indicating who this instance of a resource is for string null no
termination_protection (Optional) Switch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to false. bool false no
visible_to_all_users (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. bool true no
vpc_id The VPC ID to create the security groups in string n/a yes

Outputs

Name Description
applications The applications installed on this cluster.
arn The ARN of the cluster.
bootstrap_action A list of bootstrap actions that will be run before Hadoop is started on the cluster nodes.
configurations The list of Configurations supplied to the EMR cluster.
core_instance_group_0_id Core node type Instance Group ID, if using Instance Group for this node type.
ec2_attributes Provides information about the EC2 instances in a cluster grouped by category: key name, subnet ID, IAM instance profile, and so on.
id The ID of the EMR Cluster
log_uri The path to the Amazon S3 location where logs for this cluster are stored.
managed_master_security_group_id EMR managed_master security group ID
managed_service_access_security_group_id EMR managed_service_access security group ID
managed_slave_security_group_id EMR managed_slave security group ID
master_instance_group_0_id Master node type Instance Group ID, if using Instance Group for this node type.
master_public_dns The public DNS name of the master EC2 instance.
master_security_group_id EMR master security group ID
name The name of the cluster.
notebook_instance_security_group_id Notebook instance security group ID
notebook_master_security_group_id Notebook master security group ID
release_label The release label for the Amazon EMR release.
service_role The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.
slave_security_group_id EMR slave security group ID
visible_to_all_users Indicates whether the job flow is visible to all IAM users of the AWS account associated with the job flow.