I don’t usually post about politics (I’m not particularly savvy about polling, which is where data science has had the most substantial impact on politics), but I saw a hypothesis about Donald Trump’s Twitter account that simply begged to be investigated with data:
@tvaziri's hypothesis about Donald Trump’s Twitter account
When Trump wishes the Olympic team good luck, he’s tweeting from his iPhone. When he’s insulting a rival, he’s usually tweeting from an Android. Is this an artifact showing which tweets are Trump’s own and which are by some handler in the campaign?
Others have explored Trump’s timeline and noticed this tends to hold up. And Trump himself did indeed tweet from a Samsung Galaxy until March 2017. But how could we examine it quantitatively? During the development of the tidytext R package, I was writing about writing about text mining and sentiment analysis, and this is an excellent opportunity to apply it again.
My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people. Tweets from these two devices are posted during different times of day, and the use of hashtags, links, and retweets are distinctive. What’s more, we can see that the Android tweets are angrier and more negative, while the iPhone tweets tend to be benign announcements and pictures. Overall I’d agree with @tvaziri’s analysis. We can tell the difference between the campaign’s tweets (iPhone) and Trump’s own (Android). Let's see how.
First, let's load the content of Donald Trump’s timeline. Our dataset is from The Trump Twitter Archive by Brendan Brown, which contains all 35,000+ tweets from the @realDonaldTrump Twitter account from 2009 (the year Trump sent his first tweet) through 2018. We'll filter it for the election period only, June 1st, 2015 through November 8th, 2016.