GitHub - gpt3ch/ams_final_project: Implementation of kNN for local and hadoop (using PySpark).

In Part 1 I have executed the code locally, on dataset from website link https://openei.org/datasets/dataset/miller-keith-2018-windplantdata , dataset name 'millerkeith2018data2.csv', found my code executes in time 1.92 seconds. Next I created the code using PySpark for execution on EMR cluster in Amazon, screenshot of EMR specs is enclosed with the submission. The code executed in 0.36 seconds due to parallelization. There was a 81% reduction in execution time. Hence, I was able to demonstrate that using more than 1 server speeds up our computation and arrives at results faster. Screenshots for both executions on local as well as EMR cluster are enclosed with the python files for each.

############# Running part 1 #############

hw3_part1.py [-h] [--x X] [--dataset DATASET] [--k K]

optional arguments: -h, --help show this help message and exit --x X Latitude and longitude for point of form [lat,long] ex: [-111.931, 37.58523] --dataset DATASET Dataset location, ex : millerkeith2018data1.csv --k K Number of points to find nearest to above point, ex : 20

Sample cmd: python hw3_part1.py --x "[-111.931, 37.58523]" --dataset "millerkeith2018data1.csv" --k 20

############# Running part 2 #############

python hw3_part2.py --h usage: hw3_part2.py [-h] [--x X] [--dataset DATASET] [--k K]

optional arguments: -h, --help show this help message and exit --x X Latitude and longitude for point of form [lat,long] ex: [-111.931, 37.58523] --dataset DATASET Dataset location, ex : millerkeith2018data1.csv --k K Number of points to find nearest to above point, ex : 20

Sample cmd: python hw3_part2.py --x "[-111.931, 37.58523]" --dataset "millerkeith2018data1.csv" --k 20

Python packages required: PySpark, pandas, numpy

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
hw3_part1.png		hw3_part1.png
hw3_part1.py		hw3_part1.py
hw3_part2.png		hw3_part2.png
hw3_part2.py		hw3_part2.py
millerkeith2018data1.csv		millerkeith2018data1.csv
millerkeith2018data2.csv		millerkeith2018data2.csv
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

gpt3ch/ams_final_project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages