A Neo4j Recommender System

This repository contains the implementation of a Recommender System in Neo4j.

Dataset

The data used for recommendation come from some of the tables of the MovieLens 25M Dataset, specifically ratings.csv, movies.csv, tags.csv, genome-scores.csv and genome-tags.csv. You need to insert them in a data folder.

Population script

The script populate_db.py populates a pre-existing Neo4j graph with data from these tables. An example of instantiation of the graph can be seen in figure.

The script is gonna generate some pickle files in the data folder (serialized dictionaries that map original dataset ids to the UUIDs used in the Neo4j database).

NB: you need to have a Neo4j database running on your machine (connection is to localhost). The script is gonna ask you if you want to delete your data from the current database: this is done because if you execute the script twice, all data will be duplicated.

Recommendation

The file datasetanalysis.ipynb contains some statistics on the dataset that help understand performance.

The file queries.ipynb contains execution and performance measures of the queries implied by the following workflow.

Given a User, find his top k Genres
Given a User, find his top k Categories
Given a Genre, find its top k Users
Given a Category, find its top k Users
Given a User, find similar users
Given a Users, recommend Movies based on similar Users
Given a Movie, find similar Movies
Given a User, recommend Movies based on similarity with the ones he has rated.

The file gds_recommendation.py contains some functions used for the recommendation, basically wrappers of some GDS library functions.

Relazione.pdf and Neo4j Recommender System.pdf contain a deeper discussion on the project (in italian) and a summary presentation of it (in english).

Running

To run all the code in the respository, you can create a virtual environment and run the following commands.

virtualenv venv 
source ./venv/bin/activate
pip install -r requirements.txt

Non enterprise versions of Neo4j do not consent to have more than one active database at the time: if you don't want to use the default database neo4j, you can create a new one and activate it following this procedure.

NB: it is advisable to execute the script populate_db.py on a machine with at least 8 GB of RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Neo4j Recommender System

Dataset

Population script

Recommendation

Running

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Neo4j Recommender System.pdf		Neo4j Recommender System.pdf
README.md		README.md
Relazione.pdf		Relazione.pdf
datasetanalysis.ipynb		datasetanalysis.ipynb
gds_recommendation.py		gds_recommendation.py
graph.png		graph.png
populate_db.py		populate_db.py
queries.ipynb		queries.ipynb
requirements.txt		requirements.txt

License

aleceress/movielens_rs

Folders and files

Latest commit

History

Repository files navigation

A Neo4j Recommender System

Dataset

Population script

Recommendation

Running

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages