Skip to content

Final project for the Text Mining and Search exam. Creating models to predict overall rating of the review.

Notifications You must be signed in to change notification settings

dtoniolo/Amazon-Reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wish Upon a Star

TOC

Overview

This repository hosts the code for the final project for the Text Mining and Search course. The task we aim to solve is a binary classification of Amazon product reviews from the Clothing, Shoes and Jewellery category into positive and negative using the text of the review as the input data.

How to Use

In order to explore our work, please refer to the Main Notebook.ipynb file. This notebook hold all the code, NLP processing and classification models used presented in a complete way.

This notebook is set up for execution on Google Colab. You may need or wish to modify a couple or lines of code:

  1. In the first line of code, the dataset is downloaded. If you already have it in another location, you can avoid running the cell. Be sure to correctly specify the file path (3rd cell).
  2. The results are by default saved in a Google Drive folder. You will need to change the results folder path in order to match you own Drive structure. Please note that the Results folder must exists before the notebook runs.

Other Repo Files

The Experiments directory holds the code that was used to decide which NLP pipeline and which classifiers to use. It was structured for reusability and modularity. The code in this folder was then refactored and included in the Main Notebook.ipynb with the goal of making the notebook as understandable and self-contained as possible.

The two utils and nlp modules act as libraries for the other main files.

Data

The data can be found at this link.

About Us

Davide Toniolo

  • Current Studies: Student at the Master Degree in Data Science, University of Milano Bicocca
  • Background: Bachelor Degree in Physics, University of Milano Bicocca

Lorenzo Camaione

  • Current Studies: Student at the Master Degree in Data Science, University of Milano Bicocca
  • Background: Bachelor Degree in Computer Science, University of L'Aquila

Acknowledgements

Thanks to @malborroni for the this awesome readme template.

About

Final project for the Text Mining and Search exam. Creating models to predict overall rating of the review.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published