Skip to content

JanKoci/Recommender-systems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recommender Systems for web articles

This repository was created as a part of my bachelor's thesis. Recommender Systems were the topic of my thesis and its main objective was to create several models recommending web articles in the domain developers.redhat.com. In total four different models were created. They use traditional methods, such as the Singular value decomposition (SVD) or collaborative filtering with the Alternating Least Squares (ALS) method, and also propose some rather less common approaches using the Doc2Vec and a SkipGram negative sampling inspired methods. Besides the source files of all implemented recommender models, this repository also includes all required datasets and latex documentation of the thesis text.

Downloading pretrained Doc2Vec model

! To be able to work with the pre-trained Doc2Vec model (used in the load_pretrained method in Doc2VecModel class) it is necessary to first download the pre-trained model from:

The model was retrieved from the following GitHub repository:

After downloading the model you have to unzip its content and place the enwiki_dbow directory into the data/processed folder. After that the pretrained binary model should be availabe at data/processed/enwiki_dbow/doc2vec.bin.

Performing experiments

To review the experiments performed with our models refer to the experiments.ipynb file. This ipython notebook explains the usage of our models and describes the process of their evaluation. It also shows some of the experiments created in the course of this thesis and enables one to further play with the models and study their abilities.

Directory structure

Besides the implemented recommenders this repository also includes all required datasets and latex source files used to create its pdf documentation. The repository forms the following structure:

  • data: directory containing our datasets
    • json: contains metadata of articles in JSON
    • processed: contains processed datasets and other needed structures
    • tests: contains test files for testing UserMappings class
  • doc_: contains pdf file of this thesis
  • src_doc: contains latex source files of the this thesis
  • src: contains all models and other source files
    • abstract: module with the RecommenderAbstract class
    • my_sparse: module with the IncrementalSparseMatrix class
    • user_mappings: module with the UserMappings class
    • als_model.py: implements the ALS recommender
    • data_utils.py: implements functions and classes working with data
    • doc2vec_class.py: implements the Doc2VecInput and Doc2VecClass classes
    • doc2vec_model.py: implements the Doc2VecModel recommender
    • evaluation.py: implements the Evaluator class used for evaluating the models
    • experiments.ipynb: ipython notebook showing the performed experiments
    • helpers.py: implements helper functions
    • optimal_parameters.py: contains optimal parameters of the recommenders
    • optimization.py: implements the Optimizer class used for finding optimal hyperparameters of our recommenders
    • skip_gram_recommender.py: implements the SkipGramModel and SkipGramRecommender classes
    • svd_model.py: implements the SVDModel class
  • requirements.txt: contains a list of all required libraries