Skip to content

A recommendation system that can be run on big data on multiple spark cluster architechture

Notifications You must be signed in to change notification settings

Rkbp-099/Distributed-Recommedation-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Distributed-Recommedation-System

A recommendation system that can be run on big data on multiple spark cluster architechture

  • Worked on building distributed recommendation system.
  • Learned basic content and collaborative filtering based techniques
  • Used Alternating least squares matrix factorization implemented in Spark ,for a scalable model capable of handling big data and parallelizable on a cluster

Dataset

For this project, Amazon Review Dataset (2014) have been utilized. This dataset includes more and newer reviews along with metadata related to product like color, product type, technical details and product images taken by the users. The dataset is also divided into smaller subsets for training purposes like K-cores is arranged such that each of the remaining users and items have k reviews each along with Ratings only dataset which includes no metadata or review but only (item, user, rating, timestamp) tuples. The total number of reviews is 233.1 million (34 GB). This dataset contains various categories like - Musical Instruments, Books, Amazon Instant Video, Digital Music etc.

Results

Category RMSE
Musical Instruments 0.97
Amazon Instant Video 0.91
Digital Music 0.85
Tools and home improvement 1.017

The code has not been tested on a multicluster system. Above results are produced by experimenting on a single cluster machine.

About

A recommendation system that can be run on big data on multiple spark cluster architechture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published