This project aims to predict the number of downloads for a given song primarily through the attributes provided by the million song dataset. More details of the project can be found at http://janvitek.org/pdpmr/f17/project.html
- Java (v1.8+)
- Spark (v2.2.0 compiled with Scala 2.11.X and hadoop-2.7)
- Scala (v2.11.X)
- .Renviron should exist to generate the report in R
Build the project by typing command make build
. Run the project locally by running make run
- Update the variables SCALA_HOME and SPARK_HOME as per your configuration in Makefile.
- The model can be loaded in memory by running the
make run
command. It assumes that you have a partition /mnt/pdpmr of size 100 MB or uses the /tmp as the scratch directory for loading the model.
- The project has both java and scala files.
sources/bishwajeet_rashmi/src/main/java
contains the java files.sources/bishwajeet_rashmi/src/main/scala
contains the scala files. - The RandomForest model is built using RFEngine.
- The loading/deployment of the model is part of the Model class.