Skip to content
Gabrielle Agrocostea edited this page Mar 14, 2016 · 3 revisions

Welcome to the twitter_models wiki!

The goal of this project is to build a linear model to predict twitter impressions based on followers and retweets.

1st attempt:

  • subset data to only 28 properties
  • Jan 2016 Columns available: time, impressions, retweets, page_id, twitter_id, fans

Properties subset:

Twitter_id Twitter_name
'30309979' '106andpark'
'18342955' 'abc11_wtvd'
'16374678' 'abc7'
'119606058' 'aquiyahorashow'
'16560657' 'bet'
'226299107' 'betnews'
'9695312' 'billboard'
'32448740' 'brueggers'
'1426645165' 'bustle'
'21308602' 'cartoonnetwork'
'759251' 'cnn'
'73200694' 'coach'
'634784951' 'dasaniwater'
'14946736' 'directv'
'27677483' 'essencemag'
'25053299' 'fortunemagazine'
'436171805' 'fusionpop'
'25453312' 'hallmarkchannel'
'14934818' 'instyle'
'192981351' 'landroverusa'
'2367911' 'mtv'
'19426551' 'nfl'
'25589776' 'peoplemag'
'223525053' 'ringlingbros'
'5988062' 'theeconomist'
'14293310' 'time'
'40924038' 'univision'
'15513910' 'valvoline'

Data Cleaning: Removed outliers in quantiles: .9 and .1

Modeling Training for all models - 90% of data Testing - 10% of data

Model 1 information

  • Trained on all 27 properties
  • alpha used: 0.1
  • coefficients: array([ 8.50609133e-03, 2.16934297e+02])

Model 2 information

  • Trained on smaller subset of properties
  • alpha tuned using cross validation:
  • coefficients:
Clone this wiki locally