Skip to content

sumedhpb/We_R_Pythons

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Repository: https://github.com/sumanthvrao/MovieBuddy

Report: https://github.com/sumanthvrao/MovieBuddy/Report.pdf

MovieBuddy

How amazing would it be if you could watch your favorite movie with someone who has similar interests like you! We compared different recommendation system models (Content-based filtering, Collaborative filtering, Restricted Boltzmann Machine) to find common movie interests among a group of people.

Dataset Link (movielens-100k-dataset.zip)

Python 3 Jupyter notebook 3 Collaborative filtering Content Based filtering Surprise Library TensorFlow

Description

MovieLens offers dataset offers about 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 user. Our aim is to bring together users with similar movie interests. In order to do this, we make use of users movie ratings and their information. We account for a variety of factors (location, interests , age .. to name some) before suggesting a MovieBuddy to you.
Our data set contains:

  • 943 users , 1682 movies and 100000 ratings.
  • Each user has rated at least 20 movies.
  • Simple demograpic information about Users.

Approach 1 - Content Based Filtering

Content based filtering also referred to as cognitive filtering recommends items based on comparison between the content of items which means the items recommended by the model is same for any user. Content-based filtering avoids the cold-start problem that forestalls other recommendation techniques, as the the system considers only the content of the movies to make recommendations.

Content Based Recommendations rely on the characteristics of the item itself. The major challenge is in identifying these characteristics of the item to be considered. The Original MovieLens dataset consists of limited information about each movie - details like movie title, year of release, movie id, imdb url and list of genres. This data alone was insufficient to bring out valuable recommendations for a movie. We used tmdb (The Movie Database) api to extract more details for each movie. This api enabled us to obtain other characteristics like names of the protagonists, director etc. We created a hybrid feature for each movie which comprised of the name of the movie, year of release, list of genres, name of the director, name of the primary actor, name of secondary actor.

The Countvectorizer module identified 9105 distinct new features for each movie where each feature is a word extracted from the hybrid feature set of all the movies. We then calculated the self-cosine similarity of the matrix to compare each movie with every other movie in the dataset. Based on this similarity matrix we recommend 15 movies for every given movie.

Approach 2 - Collaborative Filtering

Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences from many users. The collaborative filtering model attempts to recommend movies and how much a user likes each movie by considering either user-user similarity or movie-movie similarity

Surprise (Simple Python RecommendatIon System Engine) library was used for Collaborative filtering. Results of running different collaborative filtering algorithms have been documented in the table below.

Algorithm Mean RMSE Mean MAE Mean fit time Mean test time
SVD 0.9358 0.7375 11.38 0.45
KNN Basic (pearson baseline) 1.0005 0.7917 5.65 10.91
KNN Basic (MSD) 0.979 0.7731 1.23 8.47
KNN Basic (cosine) 1.0174 0.8045 4.41 9.26
KNN with means (pearson baseline) 0.9382 0.731 4.5 8.74
KNN with means (MSD) 0.9502 0.7486 1.34 9.71
KNN with means (cosine) 0.9556 0.7546 4 8.55

We chose SVD as our collaborative filtering algorithm as it had the least testing time, and lower RMSE and MAE values across the 5-folds.

Approach 3 - Restricted Boltzman Machine

The fundamental idea here is to use an RBM for each user with shared weights for users who rate the same set of movies. Every RBM has the same number of hidden units, but an RBM has active softmax visible units only for the items rated by that user. If two users have rated the same movie, their two RBM’s must use the same weights between the softmax unit for that movie and the hidden units. To ensure binary mappings, nodes with ratings from 1 to k are made for every user’s RBM for each movie he/she has rated. Each node is activated or deactivated based on the value it is looking for. It is shown that an RBM slightly outperform carefully tuned SVD models. A 2 layered undirected neural network was used as an RBM in our case.

Authors

About

Data Analytics Project for the year 2018

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%