Home

Jump to bottom

Gabrielle Agrocostea edited this page Mar 14, 2016 · 3 revisions

Welcome to the twitter_models wiki!

The goal of this project is to build a linear model to predict twitter impressions based on followers and retweets.

1st attempt:

subset data to only 28 properties
Jan 2016 Columns available: time, impressions, retweets, page_id, twitter_id, fans

Properties subset:

Twitter_id	Twitter_name
'30309979'	'106andpark'
'18342955'	'abc11_wtvd'
'16374678'	'abc7'
'119606058'	'aquiyahorashow'
'16560657'	'bet'
'226299107'	'betnews'
'9695312'	'billboard'
'32448740'	'brueggers'
'1426645165'	'bustle'
'21308602'	'cartoonnetwork'
'759251'	'cnn'
'73200694'	'coach'
'634784951'	'dasaniwater'
'14946736'	'directv'
'27677483'	'essencemag'
'25053299'	'fortunemagazine'
'436171805'	'fusionpop'
'25453312'	'hallmarkchannel'
'14934818'	'instyle'
'192981351'	'landroverusa'
'2367911'	'mtv'
'19426551'	'nfl'
'25589776'	'peoplemag'
'223525053'	'ringlingbros'
'5988062'	'theeconomist'
'14293310'	'time'
'40924038'	'univision'
'15513910'	'valvoline'

Data Cleaning: Removed outliers in quantiles: .9 and .1

Modeling Training for all models - 90% of data Testing - 10% of data

Model 1 information

Trained on all 27 properties
alpha used: 0.1
coefficients: array([ 8.50609133e-03, 2.16934297e+02])

Model 2 information

Trained on smaller subset of properties
alpha tuned using cross validation:
coefficients: