Sentiment Analysis

Welcome to the wiki page of the Sentiment Analysis.

In this page, you will find out how the main emotion is extracted from a raw text, how subjectivity score is produced and instructions on how to perform sentiment analysis for other languages as well. There are some ideas for future work too.

Emotion Analysis

In Natural Language Processing, Emotion Analysis is the process in which we try to extract the emotions that the writer feels when he writes a text.

Emotion (and Subjectivity Analysis) for Greek language are based on a sentiment lexicon. A sentiment lexicon is usually a file that contains a list of words and the emotion they express based on their Part of Speech Tag in a sentence.

Extracting the main emotion for a text is as easy as lemmatizing the text, matching the words to their emotions based on their POS tag and keeping a score for each of the potential emotions. The emotion with the highest score is considered the main emotion of the text.

Subjectivity Analysis

In Natural Language Processing, Subjectivity Analysis is the process in which we try to identify whether the writer's opinion in the text is subjective or objective.

As you may understand, Subjectivity Analysis is a really difficult task even for humans because it is sometimes unclear if an opinion is based on the writer's personal view of the world or it is the objective truth we all believe in. The value of this feature is not to reach a 100% accuracy which is impossible even for humans, but to perform well in easy tasks. For example, we expect a scientific text to be predicted more objective than a text talking about God.

Subjectivity Analysis for Greek language is also based on a lexicon. How subjective or objective a word is, depends on the POS tag.

Note: The script for subjectivity and emotion analysis can be found here.

Issues to resolve

There are some well known issues in sentiment analysis that we need to mention.

Lexicon approach fails if used as it is.
```
I am NOT happy.
```
"I", "am", "NOT" are words which are not related to a particular emotion. "Happy" as you expect is related to the emotion of happiness. So, the classifier would say that this text's main emotion is happiness which apparently is not true.

Solution: Emotion analysis using a lexicon should be combined with a DEP analysis.
Sentiment Analysis is dependent on the POS Tagger and the Lemmatizer.

Failures of POS tagger may result to different emotions and failures of Lemmatizer may eliminate the number of matched tokens.

Solution: POS tagger and Lemmatizer should be as good as possible before starting a Sentiment Analysis effort.
Irony.

The word speaks for itself. Irony detection is a very difficult task for humans and even more difficult for computers. Lexicon approaches fail to detect irony and thus lead to poor results in datasets where irony is common (e.g. Twitter posts).