This repository contains a Movie Recommendation System built with Python, leveraging machine learning techniques to suggest movies based on user preferences. The model employs cosine similarity on a matrix of movie ratings, enabling recommendations for similar movies based on a selected title.
The system uses the MovieLens 20M Dataset for training, which includes millions of movie ratings. Our approach focuses on generating high-quality recommendations by creating a movie-user matrix and applying a collaborative filtering technique.
- Cosine Similarity Calculation: Computes similarity scores between movies based on user ratings.
- Genre One-Hot Encoding: Extracts and utilizes genre data for better filtering.
- Easy-to-Use Function: Just input a movie name, and get the top recommendations instantly.
- Data Preprocessing: Handles missing values, duplicates, and applies filtering to retain only movies with a significant number of ratings.
-
Clone the Repository:
git clone https://github.com/yourusername/movierecommendation.git cd movierecommendation
-
Install Dependencies:
Make sure you have Python installed, then install the required packages.pip install -r requirements.txt
-
Download Dataset:
The dataset is automatically downloaded via Kaggle API when running the script. Ensure you have Kaggle API configured.
-
Load and Run: After installation, you can run the script directly to test the recommendation function.
# Import the main function from movie_recommendation import get_recommendations_by_name # Get recommendations recommendations = get_recommendations_by_name("Thor", similarity_matrix, movie_rating, top_n=5) print(recommendations)
- Data Preprocessing: Cleans the dataset, handling missing values and duplicates.
- Matrix Generation: Creates a sparse matrix using
csr_matrix
to optimize memory. - Similarity Calculation: Applies cosine similarity to find the closest matches for each movie in the dataset.
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to submit issues, fork the repository, and make a pull request with your ideas for improvement.