Analysis Clustering Coefficient. Final Project in BigData Adv.
LiveJournal Dataset
http://snap.stanford.edu/data/soc-LiveJournal1.txt.gz
$ wget http://snap.stanford.edu/data/soc-LiveJournal1.txt.gz
$ gunzip soc-LiveJournal1.txt.gz
- Google Cloud Platform - DataProc
- HDFS
- Hadoop MapReduce, Spark
$ hadoop jar task1.jar bigdata.Task1 soc-LiveJournal1.txt task1_result
$ hadoop jar task2.jar bigdata.Task2 task1_result task2_result
$ spark-submit --num-executors 10 --conf "spark.default.parallelism=32" –class bigdata.Task3 task3.jar task1_result task3_result
$ spark-submit --num-executors 10 --class bigdata.Task4 task4.jar task2_result task3_result task4_result