GitHub - aghassen/Terraform-aws-emr-rdsMetastore: aws infrastructure for delta benchmarks

Terraform_Emr_ExternalMetastore

A better Terraform solution to create an AWS EMR cluster, including:

S3 (Cluster Logs)
RDS (External Hive Metastore Database)
IAM (Roles with AWS Managed Policies)
EMR

Prerequisites (*)

Create a key pair and download it to your local machine key pair name : EMR key pair

Structure:

 ├── modules
 │   ├── emr
 │   │   ├── main.tf
 │   │   ├── outputs.tf
 │   │   ├── templates
 │   │   │   └── configurations.json.tpl
 │   │   └── variables.tf
 │   ├── rds-metastore
 │   │   ├── main.tf
 │   │   ├── outputs.tf
 │   │   └── variables.tf
 │   └── s3
 │       ├── main.tf
 │       ├── outputs.tf
 │       └── variables.tf
 ├── README.md
 ├── main.tf
 ├── outputs.tf
 └── variables.tf

This repository is customly made to build an AWS infrastructure that goes with Delta's framework for writing benchmarks to measure Delta's performance (https://github.com/delta-io/delta/tree/master/benchmarks):

Create external Hive Metastore using Amazon RDS

Create an external Hive Metastore in a MySQL database using Amazon RDS with the following specifications:

MySQL 8.0.15 on a db.m5.large.
General purpose SSDs gp2 no Autoscaling storage.
Non-empty password for admin (default : adminadmin).
Same region, VPC, subnet as those you will run the EMR cluster. default : Region us-west-2

Create EMR cluster

Create an EMR cluster that connects to the external Hive Metastore with the following specifications:

EMR version 6.5.0 having Apache Spark 3.1
Master - i3.2xlarge
Workers - 1 x i3.2xlarge (or 16 workers if you are not only testing by running the 1GB benchmark).
Hive-site configuration to connect to the Hive Metastore.
Same region, VPC, subnet as those you will run the EMR cluster. default : Region us-west-2
No autoscaling, and default EBS storage.

Prepare S3 bucket

Create a new S3 bucket default name : benchmarksbucket1919 which is in the same region your EMR cluster default : Region us-west-2.

Terraform VARIABLES :

region : default = "us-west-2"
environment : default = "terraform-aws-emr"
cluster_name : default = "emr-cluster"
bucket : default = "benchmarksbucket1919"
mySQL db name : default = "mydb"
mySQL db username: default = "admin"
mySQL db password: default = "adminadmin"

Terraform OUTPUTS :

emr_cluster_master_public_dns
rdsmysql_hive_metastore_db_instance_address
rds_mysql_hive_metastore_db_instance_name
s3_bucket_emr_cluster_id

Initalization

Before executing your first plan, you need to initialize Terraform:

terraform init

Planning

Terraform allows you to "Plan", which allows you to see what it would change without actually making any changes.

terraform plan

Applying

Finally, affer initialization, planning, you can apply your changes:

terraform apply

Destroying

If you want Terraform to clean up anything it made, you can destroy the cluster:

terraform destroy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terraform_Emr_ExternalMetastore

Prerequisites (*)

Structure:

Create external Hive Metastore using Amazon RDS

Create EMR cluster

Prepare S3 bucket

Terraform VARIABLES :

Terraform OUTPUTS :

Initalization

Planning

Applying

Destroying

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
modules		modules
README.md		README.md
main.tf		main.tf
outputs.tf		outputs.tf
variable.tf		variable.tf

aghassen/Terraform-aws-emr-rdsMetastore

Folders and files

Latest commit

History

Repository files navigation

Terraform_Emr_ExternalMetastore

Prerequisites (*)

Structure:

Create external Hive Metastore using Amazon RDS

Create EMR cluster

Prepare S3 bucket

Terraform VARIABLES :

Terraform OUTPUTS :

Initalization

Planning

Applying

Destroying

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages