Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 1.96 KB

README.md

File metadata and controls

38 lines (28 loc) · 1.96 KB

Amazon Elastic Map Reduce (EMR) Demonstration

Reposits the resources used in the EMR on EC2 Cluster project.

Amazon-EMR


Project Descriptions

1. Amazon EMR Demonstration

  • Overview: This project demonstrates how to set up and utilize Amazon EMR (Elastic MapReduce) for big data processing and analytics tasks.

Included in the documentation File:

  • VPC creation: The demonstration on creating an Amazon VPC.
  • Amazon Simple Storage Service_bucket_creation: Demonstration on creating an Amazon S3 bucket.
  • IAM role creation: Demonstration on creating an IAM role in AWS Management Console.
  • EMR cluster creation: The demonstration on creating an Amazon EMR on EC2 cluster.
  • EMR studio creation: The demonstration on creating an Amazon EMR Studio.
  • EMR workspace creation: The demonstration on creating an Amazon EMR workspace.
  • Spark job execution: The demonstration on running a Spark Job with Amazon EMR Studio Notebook.
  • Resource cleanup: The demonstration on cleaning up the resources.

Documentation.pdf: Detailed documentation of the entire Amazon EMR demonstration.

2. Dataset and Code Files

  • Description: This repository contains the dataset and code files used in the Amazon EMR demonstration project as listed below:
  • dataset_en_dev.json: Dataset file used in the demonstration.
  • reviews.py: Python script used in the demonstration.
  • reviews.ipynb: Jupyter notebook used in the demonstration.

Usage

  1. Clone this repository to your local machine.
  2. Explore the project folders and files to understand each demonstration.
  3. Follow the instructions provided in the transcripts and documentation to replicate the demonstrations in your own AWS environment.