This repository has the analysis that was did for 3 politicians from Rio de Janeiro using their tweets.
The paper is published and can be downloaded at this url: https://periodicos.uff.br/anaisdoser/article/view/29333.
That was the following steps for the text analysis:
Using the social network API (package twitteR), get tweets from users.
- To lower
- Tokenization
- Remove punctuation
- Remove stopwords
- Stem (to join with the sentimental lexicon)
Using sentiLex_lem_PT02 dictionary
Identify importants terms to each poltician. This technique consider the term frequency and the inverse document frequency.
We used the LDA (Latent Dirichlet Allocation) algorithm to build a model to predict each tweet and classify them into a group.