Covid19-Data-analysis-Map-Reduce-

Covid19 Data analysis with Hadoop map-reduce and spark

The program is to do the analysis of covid data using map reduce in hadoop and spark.

Java is used for hadoop and python for spark.

The task were executed using hadoop and spark with docker.

To get the data execute the following command : wget http://bmidb.cs.stonybrook.edu/publicdata/cse532-s20/covid19_full_data.csv

Import the data on hadoop cloud using : hdfs dfs -put covid19_full_data.csv /InputData/

To execute the Covid19_1.java in hadoop use the following command : hadoop Covid19.jar Covid19_1 /cse532/input/covid19_full_data.csv [true/false] /cse532/output/

True/False is used to include/exclude the cumulative covid cases of world.

Covid19_3.java is using distributed cache in hadoop. To run the program command is : hadoop jar Covid19_3.jar /covid19_full_data.csv /

Spark Program Run command : spark-submit /Covid19_3.py /covid19_full_data.csv

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Covid19_1.java		Covid19_1.java
Covid19_3.java		Covid19_3.java
README.md		README.md
SparkCovid19_2.py		SparkCovid19_2.py
SparkCovid19_3.py		SparkCovid19_3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid19-Data-analysis-Map-Reduce-

About

Releases

Packages

Languages

Aditya-1001/Covid19-DataAnalysis-MapReduce

Folders and files

Latest commit

History

Repository files navigation

Covid19-Data-analysis-Map-Reduce-

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages