MovieBuddy

Code Repository: https://github.com/sumanthvrao/MovieBuddy

Report: https://github.com/sumanthvrao/MovieBuddy/Report.pdf

MovieBuddy

How amazing would it be if you could watch your favorite movie with someone who has similar interests like you! We compared different recommendation system models (Content-based filtering, Collaborative filtering, Restricted Boltzmann Machine) to find common movie interests among a group of people.

Dataset Link (movielens-100k-dataset.zip)

Description

MovieLens offers dataset offers about 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 user. Our aim is to bring together users with similar movie interests. In order to do this, we make use of users movie ratings and their information. We account for a variety of factors (location, interests , age .. to name some) before suggesting a MovieBuddy to you.
Our data set contains:

943 users , 1682 movies and 100000 ratings.
Each user has rated at least 20 movies.
Simple demograpic information about Users.

Approach 1 - Content Based Filtering

Content based filtering also referred to as cognitive filtering recommends items based on comparison between the content of items which means the items recommended by the model is same for any user. Content-based filtering avoids the cold-start problem that forestalls other recommendation techniques, as the the system considers only the content of the movies to make recommendations.

Content Based Recommendations rely on the characteristics of the item itself. The major challenge is in identifying these characteristics of the item to be considered. The Original MovieLens dataset consists of limited information about each movie - details like movie title, year of release, movie id, imdb url and list of genres. This data alone was insufficient to bring out valuable recommendations for a movie. We used tmdb (The Movie Database) api to extract more details for each movie. This api enabled us to obtain other characteristics like names of the protagonists, director etc. We created a hybrid feature for each movie which comprised of the name of the movie, year of release, list of genres, name of the director, name of the primary actor, name of secondary actor.

The Countvectorizer module identified 9105 distinct new features for each movie where each feature is a word extracted from the hybrid feature set of all the movies. We then calculated the self-cosine similarity of the matrix to compare each movie with every other movie in the dataset. Based on this similarity matrix we recommend 15 movies for every given movie.

Approach 2 - Collaborative Filtering

Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences from many users. The collaborative filtering model attempts to recommend movies and how much a user likes each movie by considering either user-user similarity or movie-movie similarity

Surprise (Simple Python RecommendatIon System Engine) library was used for Collaborative filtering. Results of running different collaborative filtering algorithms have been documented in the table below.

Algorithm	Mean RMSE	Mean MAE	Mean fit time	Mean test time
SVD	0.9358	0.7375	11.38	0.45
KNN Basic (pearson baseline)	1.0005	0.7917	5.65	10.91
KNN Basic (MSD)	0.979	0.7731	1.23	8.47
KNN Basic (cosine)	1.0174	0.8045	4.41	9.26
KNN with means (pearson baseline)	0.9382	0.731	4.5	8.74
KNN with means (MSD)	0.9502	0.7486	1.34	9.71
KNN with means (cosine)	0.9556	0.7546	4	8.55

We chose SVD as our collaborative filtering algorithm as it had the least testing time, and lower RMSE and MAE values across the 5-folds.

Approach 3 - Restricted Boltzman Machine

The fundamental idea here is to use an RBM for each user with shared weights for users who rate the same set of movies. Every RBM has the same number of hidden units, but an RBM has active softmax visible units only for the items rated by that user. If two users have rated the same movie, their two RBM’s must use the same weights between the softmax unit for that movie and the hidden units. To ensure binary mappings, nodes with ratings from 1 to k are made for every user’s RBM for each movie he/she has rated. Each node is activated or deactivated based on the value it is looking for. It is shown that an RBM slightly outperform carefully tuned SVD models. A 2 layered undirected neural network was used as an RBM in our case.

Authors

Suraj Aralihalli - Profile
Sumedh Pb - Profile
Sumanth V Rao - Profile

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.ipynb_checkpoints		.ipynb_checkpoints
BasicAnalysis_and_Stocktaking		BasicAnalysis_and_Stocktaking
Getting_data_from_api		Getting_data_from_api
ml-100k		ml-100k
CF-surprise-all-algos.ipynb		CF-surprise-all-algos.ipynb
CF_surprise_SVD.ipynb		CF_surprise_SVD.ipynb
Collaborative_filtering_RBM.ipynb		Collaborative_filtering_RBM.ipynb
Content_based_filtering.ipynb		Content_based_filtering.ipynb
Final_Module.ipynb		Final_Module.ipynb
Front_End_Widget.ipynb		Front_End_Widget.ipynb
JupyterLinks.md		JupyterLinks.md
README.md		README.md
We_R_Pythons_FinalReport.pdf		We_R_Pythons_FinalReport.pdf
We_R_Pythons_LiteratureSurvey.pdf		We_R_Pythons_LiteratureSurvey.pdf
generating_user_item_matrix.ipynb		generating_user_item_matrix.ipynb
surprise cf.ipynb		surprise cf.ipynb
top10.mat		top10.mat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieBuddy

Description

Approach 1 - Content Based Filtering

Approach 2 - Collaborative Filtering

Approach 3 - Restricted Boltzman Machine

Authors

About

Releases

Packages

Languages

sumedhpb/We_R_Pythons

Folders and files

Latest commit

History

Repository files navigation

MovieBuddy

Description

Approach 1 - Content Based Filtering

Approach 2 - Collaborative Filtering

Approach 3 - Restricted Boltzman Machine

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages