Movie Recommendations Project

W2022 MDST Project: Building a Movie Recommender System

Introduction

What do you do when you want to watch a movie, but don't know what to watch? Like... really don't know what to watch? What drives your decision-making for watching a particular movie?

Did a friend suggest it to you?
Do you google watchlists?
Maybe you take buzzfeed quizzes for recommendations?

If you ever felt unsatisfied with movie recommendation engines, or just want to learn more about how they work, then this is the project for you!

Description

The goal of this project is to make a functional recommender system and learn how and why it recommends the movies it does. The two main kinds of recommender systems we plan to explore are content-based and collaborative filtering (more information can be found here). These programs will be used as engines to drive an online quiz (similar to buzzfeed quizzes) to give ~10 movie recommendations.

Here are some of the relevant data science buzzwords and jargon for this project!

Regression
(Un)supervised Learning
K-Nearest Neighbors
Matrix Factorization (Asymmetric SVD)
Naive Bayes
Recommender System

Goals

Design a functional recommender system from scratch and gain insight to their mechanics
Provide MDST members the opportunity to work with recommender systems that are very prevalent in industry
Have a user interface (form of a website)
Have fun and learn something! 😃

Stretch Goals

Augment movie preferences with Bayesian conditional probability scores
Test recommender systems on larger datasets
Incorporate nonrating data along side ratings to boost prediction performance
Predict genre from movie preferences by analyzing latent factors

A Look at the Data

The data is from the Movie Lens | Group Lens dataset. The dataset can also be obtained through TensorFlow. The main focus will be on the 100k dataset.

Project Roadmap

Week of 1/30: Learn Our Data

Kickoff!
Introductions
Exploratory Data Analysis

Week of 2/6: Methodology

Data cleaning
More EDA
Basic modeling
Introduction to algorithms (kNN, Matrix Factorization)

Weeks of 2/13-3/13: Build Models

Sub-teams!
In-depth analysis of algorithms
Development of algorithm specification
Building, training, and testing models
Create visualizations

2/26-3/5: Spring Break!

Week of 3/20-3/27: Refine Models

Evaluate and run models
Preliminary results
Create visualizations

Week of 3/27-4/3: Develop Quiz Application

Plan out application design
Flesh out basic API to interact with webpage
Test it!

Week of 4/10: Finishing Touches

Complete the write-up
Final Presentations!

Setup

Getting all setup to contribute to this project is as simple as a few commands.

Virtual Environment

We are going to initialize a Python virtual environment with all the required packages. We use a virtual environment here to isolate our development environment from the rest of your computer. This is helpful in not leaving messes and keeping project setups contained.

First create a Python 3.8 virtual environment. The virtual environment creation code for Linux/MacOS is below:

python3 -m venv venv

Now that you have a virtual environment installed, you need to activate it. This may depend on your system, but on Linux/MacOS, this can be done using

source ./venv/bin/activate

Now your computer will know to use the Python installation in the virtual environment rather than your default installation.

After the virtual environment has been activated, we can install the required dependencies into this environment using

pip install -r requirements.txt

If you also want to install dependencies of the development environment like code formatters and Jupyter notebook, run

pip install -r requirements-dev.txt

Obtaining the Data

Getting the MovieLens dataset this project utilizes is not too difficult as well. With your virtual environment activated, run

python setup.py

That's all! You'll find the extracted dataset in the data folder. If you'd like more control over where you want to download and extract the dataset, use the download and extract options:

python setup.py --download <custom_filepath> --extract <custom_filepath> <custom_extraction_dir>

All download options can be viewed using

python setup.py --help

Known Issues

M1 Mac users may have trouble installing Scipy through pip due to problems with support for BLAS (Basic Linear Algebra Subprograms) There are two options:
- Remove seaborn==0.11.2 from the dependencies and instead use matplotlib for visualization functionality
- Manually install openBLAS and compile Scipy from scratch (not recommended - we likely cannot help you debug any issues with this)

Relevant Links

MDST Calendar

Dataset:

Movie Lens | Group Lens
via TensorFlow

Resources:

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
icons		icons
interface		interface
mvrc		mvrc
starter_code		starter_code
streamlit		streamlit
.gitignore		.gitignore
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendations Project

Table of Contents

Introduction

Description

Goals

Stretch Goals

A Look at the Data

Project Roadmap

2/26-3/5: Spring Break!

Setup

Virtual Environment

Obtaining the Data

Known Issues

Relevant Links

About

Releases

Packages

Contributors 2

Languages

MichiganDataScienceTeam/W22-Movie-Recommendations

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendations Project

Table of Contents

Introduction

Description

Goals

Stretch Goals

A Look at the Data

Project Roadmap

2/26-3/5: Spring Break!

Setup

Virtual Environment

Obtaining the Data

Known Issues

Relevant Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages