[College Project] Big Data project using Spark for the Big Data class at UFRJ
Install the following dependencies in order to build and run the project
Self-organizing Map
- Location: src/som
- Language: Scala
This programs reads the Bitcoin valuation csv file and calculates the variation from a day to another. It also implements DataFrame, so we can easily get the variation of the currency by calling the method getVariation() by passing the date as a parameter.
- Location: src/variation
- Language: Java
This program is used to retrieve news from a few websites and save them on disk
- Location: tools/news-finder
- Language: Javascript
This program is used to retrieve the bitcoin price history and save it as a CSV file
- Location: tools/bitcoin-market-price-downloader
- Language: Javascript
cd src/som/
sbt package
cd src/variation/
mvn package
Creating the jar application use maven to create a jar: add this to to your pom.xml file:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>fully.qualified.MainClass</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
and call:
cd src/variation
mvn clean compile assembly:single
cd tools/news-finder/
npm install
cd tools/bitcoin-market-price-downloader/
npm install
cd src/som/
<your-spark-folder>/bin/spark-submit target/scala<version>/som_project_<version>.jar
cd src/variation
<your-spark-folder>/bin/spark-submit VBBigData-1.0-SNAPSHOT.jar variation "date(yyyy-mm-dd)"
Download all the news from all the available sites
cd tools/news-finder/
./run.sh
If you can't execute the script, change its permission (on linux)
chmod u+x ./run.sh
Download specific sites using the command
node index.js -s <site>
You can also choose the keyword, the initial and final pages
node index.js -s <site> -k <keyword> -f <from-page> -t <to-page>
Need help? Use the -h parameter
node index.js -h
Just run it using
cd tools/bitcoin-market-price-downloader/
node index.js