Reposits the resources used in the EMR on EC2 Cluster project.
- Overview: This project demonstrates how to set up and utilize Amazon EMR (Elastic MapReduce) for big data processing and analytics tasks.
Included in the documentation File:
- VPC creation: The demonstration on creating an Amazon VPC.
- Amazon Simple Storage Service_bucket_creation: Demonstration on creating an Amazon S3 bucket.
- IAM role creation: Demonstration on creating an IAM role in AWS Management Console.
- EMR cluster creation: The demonstration on creating an Amazon EMR on EC2 cluster.
- EMR studio creation: The demonstration on creating an Amazon EMR Studio.
- EMR workspace creation: The demonstration on creating an Amazon EMR workspace.
- Spark job execution: The demonstration on running a Spark Job with Amazon EMR Studio Notebook.
- Resource cleanup: The demonstration on cleaning up the resources.
Documentation.pdf: Detailed documentation of the entire Amazon EMR demonstration.
- Description: This repository contains the dataset and code files used in the Amazon EMR demonstration project as listed below:
- dataset_en_dev.json: Dataset file used in the demonstration.
- reviews.py: Python script used in the demonstration.
- reviews.ipynb: Jupyter notebook used in the demonstration.
- Clone this repository to your local machine.
- Explore the project folders and files to understand each demonstration.
- Follow the instructions provided in the transcripts and documentation to replicate the demonstrations in your own AWS environment.