For a given dataset on Kaggle https://www.kaggle.com/jameslko/gun-violence-data
- To figure out:
- Which States are most violent
- Yearly trend of gun related violence
- Total injuries and killings till now
- Types of guns used
- Gender and Age wise participants in gun related crimes
Gradle, Scala 2.11, Spark 2.2.1
goto the gun-violence-analysis directory and run
gradle build
gradle copyDeps
spark-submit --master "local[3]" --driver-class-path "build/libs/libext/*" --class com.data.GunViolenceAnalyzer build/libs/gun-violence-analysis-0.1.jar path-to-csv
export HADOOP_CONF_DIR=XXX
spark-submit \
--class com.data.GunViolenceAnalyzer \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 128M \
--num-executors 3 \
build/libs/gun-violence-analysis-0.1.jar path-to-csv
Will be stored as csv file in locations specified in config.yaml