Skip to content

okamiRvS/Sentiment-Analysis-for-Amazon-Reviews

Repository files navigation

Sentiment Analysis for Amazon-Reviews

  • Umberto Cocca - 807191

Introduction

In the last years more and more researches have broadened the understanding of textual resources, leading to the growth of online services that changed the face of shopping. E-commerce applications like Amazon acquire a disproportionate amount of data through their transactions and users, a substantial part is indeed given by the contents generated by users who evaluate the products purchased and share their experience with numerical evaluations and/or reviews.

Network Analysis

The goal of this project was to extract insights that may turn helpful for business purposes. In particular, the question I want to answer by using network analysis is: Which are the most recommended books? This can be useful to understand how to sort the products, for example within a website, in order to show to users first the ones they are most likely looking for.

Sentiment Analysis

Sentiment Analysis is used to interpret natural language and identify subjective information that denote opinions, emotions and feelings, determining the corresponding polarity (positive, negative or neutral) and finally summarizing this data so that it can be of value for a company. In this way, decisions can be made based on meaningful data rather than from simple intuitions that are not always correct. Sentiment Analysis is important because companies want their brand to be positively perceived. In this regard, the focus can be on positive or negative comments, as well as customers’ feedbacks, to evaluate both strenghts and point on which to improve. In order to apply Sentiment Analysis in this project, first the textual parts of the reviews are systematically analyzed to extract an opinion. A preliminary pre-processing phase will prepare the dataset and finally, ASUM (Aspect Sentiment Unification Model) is used to extract set of topics that refer to positive and negative sentiments from a document made of sentences.

Existing Software and Tools used

For the preprocessing and network analysis I used Python, due to the large amount of open source tools and libraries available. In particular, the following libraries were used:

Python\

  • Pandas: to load and manipulate the dataset;
  • iGraph: is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use;
  • NLTK: to split every review in a list of sentences;
  • re: to perform a partial cleaning of the data , for example deleting words composed by inadequate characters

ASUM
Using Python, the ad-hoc input for the Java version of ASUM was built The program input consists of two mandatory files and an optional one:

  • BagOfSentences.txt (mandatory)
    This file is a representation of the word list of documents in the corpus. For each document, the first line is the number of sentences, from the next line and on there is a list of indexes that refer to the relative position of a word in the WordList file.
  • WordList.txt (mandatory)
    The file maps words with indexes. It is assumed that the first word has index 0, the second has index 1 and so on.
  • SentiWords-0.txt, SentiWords-1.txt, . . . (optional )*
    These files are composed of words called "semi-sentimental". The files enumeration should start from 0 and then gradually increase, until the number of searched sentiments is reached. In the ASUM model it is possible to help the sampling process by making use of this a priori information. If, for example, we know that a given word is positive because it belongs to the lexicon of positives, then its probability of being positive is known. For this project two sentiments were searched, one positive and one negative

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published