In this study, we use social media data (specifically Twitter) to provide insights on each candidate's popularity, tweeting patterns, and most common topics. Additionally, we attempt to model and predict the success of a new candidate's tweet.
We use a public Twitter dataset containing a total of 6 thousand tweets from the candidates' official Twitter accounts: @realDonaldTrump and @HillaryClinton (about 3 thousand tweets each). Each tweet contains its text, date, number of times it was retweeted by users, number of times it was marked as favorite, along with some other metadata.
The dataset can be downloaded from the Kaggle website: https://www.kaggle.com/benhamner/clinton-trump-tweets
election2016.ipynb: Jupiter notebook with the R code and results.
data/clinton-trump-tweets.zip: dataset from kaggle
To import and run the notebook in our Data Science experience platform, follow the setup instructions here: https://github.com/IBMDataScience/getting-started
When setting up our DSX tool, choose the Jupyter bundle that includes R support to process this notebook.