Distributed-Recommedation-System

A recommendation system that can be run on big data on multiple spark cluster architechture

Worked on building distributed recommendation system.
Learned basic content and collaborative filtering based techniques
Used Alternating least squares matrix factorization implemented in Spark ,for a scalable model capable of handling big data and parallelizable on a cluster

Dataset

For this project, Amazon Review Dataset (2014) have been utilized. This dataset includes more and newer reviews along with metadata related to product like color, product type, technical details and product images taken by the users. The dataset is also divided into smaller subsets for training purposes like K-cores is arranged such that each of the remaining users and items have k reviews each along with Ratings only dataset which includes no metadata or review but only (item, user, rating, timestamp) tuples. The total number of reviews is 233.1 million (34 GB). This dataset contains various categories like - Musical Instruments, Books, Amazon Instant Video, Digital Music etc.

Results

Category	RMSE
Musical Instruments	0.97
Amazon Instant Video	0.91
Digital Music	0.85
Tools and home improvement	1.017

The code has not been tested on a multicluster system. Above results are produced by experimenting on a single cluster machine.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
codes		codes
README.md		README.md
Report.pdf		Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed-Recommedation-System

Dataset

Results

About

Releases

Packages

Languages

Rkbp-099/Distributed-Recommedation-System

Folders and files

Latest commit

History

Repository files navigation

Distributed-Recommedation-System

Dataset

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages