Skip to content

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

License

Notifications You must be signed in to change notification settings

kevinndungu-source/Amazon_EMR_Project_Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Elastic Map Reduce (EMR) Demonstration

Reposits the resources used in the EMR on EC2 Cluster project.

Amazon-EMR


Project Descriptions

1. Amazon EMR Demonstration

  • Overview: This project demonstrates how to set up and utilize Amazon EMR (Elastic MapReduce) for big data processing and analytics tasks.

Included in the documentation File:

  • VPC creation: The demonstration on creating an Amazon VPC.
  • Amazon Simple Storage Service_bucket_creation: Demonstration on creating an Amazon S3 bucket.
  • IAM role creation: Demonstration on creating an IAM role in AWS Management Console.
  • EMR cluster creation: The demonstration on creating an Amazon EMR on EC2 cluster.
  • EMR studio creation: The demonstration on creating an Amazon EMR Studio.
  • EMR workspace creation: The demonstration on creating an Amazon EMR workspace.
  • Spark job execution: The demonstration on running a Spark Job with Amazon EMR Studio Notebook.
  • Resource cleanup: The demonstration on cleaning up the resources.

Documentation.pdf: Detailed documentation of the entire Amazon EMR demonstration.

2. Dataset and Code Files

  • Description: This repository contains the dataset and code files used in the Amazon EMR demonstration project as listed below:
  • dataset_en_dev.json: Dataset file used in the demonstration.
  • reviews.py: Python script used in the demonstration.
  • reviews.ipynb: Jupyter notebook used in the demonstration.

Usage

  1. Clone this repository to your local machine.
  2. Explore the project folders and files to understand each demonstration.
  3. Follow the instructions provided in the transcripts and documentation to replicate the demonstrations in your own AWS environment.

About

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

Topics

Resources

License

Stars

Watchers

Forks