GitHub - norakirkizh/ml_politics: Machine Learning Application to Study Political Attitudes

Machine Learning Application to Predict Political Attitudes from Web Browsing Histories

In this repository I share replication materials for the article "Predicting Political Attitudes from Web Browsing Histories: Machine Learning Approach".

The paper aims to introduce machine learning approach to identify political attitudes based on peoples' website choices. Specifically, I use web tracking data of 1,000 German voters generated by them after three months of tracking. I propose to use categorization of websites based on existing domains. When matching website domains with existing categories from Webshrinker, each category represents a predicting variable in a regression. I use the following regressions: Linear Regression, Elastic Net, and Random Forest.

Below is the of available replication materials and supplementary files for replication and further research.

Machine Learning Methods: Dimentionality reduction, Linear regression, Random Forest and Elastic Net

Table with domain categories from Webshrinker that we managed to match with domains from our initial web tracking data.
R code for exploring domain categories from Webshrinker: A table with descriptive statistics like sum of visits by group of domain categories;
Distribution plots: code.
Table with top 5 domains per category. Note that Weshrinker offered subcategories withing main categories like Business. The table shows top domains for each subcategory.
Plots with descriptive OLS estimates, with controlls: Selected political attitudes and domain categories, the rest of the political attitudes;
Plots with OLS estimates with controlls for the rest of the political attitudes;
OLS, Random Forest and ElasticNet summary plot: Pearson correlations and R2 for all political attitudes (R code to make this plot);
Plots with Variable Importance Rank of domain categories for each political attitude (R code that can also produce an interactive plot with plotly): Variable importance from Random Forest, and Linear regression.
Two models showed significant predictions: support for democratic political system and interest in politics. Ploted variable importance rank for both models: Plot 1 and Plot 2 respectively.

Summary

We combined survey and web tracking data to build machine learning models where web site visits predict self-reported political attitues. There are several findings about predicting models and their applications in social science. The evidence is mixed and requires further research. We built machine learning model for each political attitude of interest, 15 in total. Two models showed significant prediction: interest in politics and support for democratic system. Issues related attitudes and populist attitudes could not be predicted from web tracking data. Web tracking data was more successful in predicting demograpgics. From variable importance rank we also learned that media related website domains have a substantial contribution for predicting political attitudes. Entertainment domains did not contribute to the model performance.

Summary of the analsis from this repository is avaiable in the Online Appendix of the paper: LINK.

Additionally, plots for validation of web tracking data: browsing behavior and privacy policy of web tracking vs national German panel.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
Data.pdf		Data.pdf
Full_categories.pdf		Full_categories.pdf
Online_Appendix.pdf		Online_Appendix.pdf
R2_corr.pdf		R2_corr.pdf
R2_plot.r		R2_plot.r
README.md		README.md
Sum_of_visits.csv		Sum_of_visits.csv
category_stat.R		category_stat.R
coef_plots.r		coef_plots.r
combined.pdf		combined.pdf
combined_appendix.pdf		combined_appendix.pdf
distribution_plot.r		distribution_plot.r
domain_categories-v2.csv		domain_categories-v2.csv
ivw_germany.pdf		ivw_germany.pdf
matrix_preprocessing.py		matrix_preprocessing.py
plot_privacy_noad.pdf		plot_privacy_noad.pdf
plot_varImp_dem.pdf		plot_varImp_dem.pdf
plot_varImp_polint.pdf		plot_varImp_polint.pdf
rf_varImp.pdf		rf_varImp.pdf
rf_varImp.r		rf_varImp.r
sum_stats.py		sum_stats.py
survey_var_pre_processing.R		survey_var_pre_processing.R
survey_vars_descriptive_statistics.R		survey_vars_descriptive_statistics.R
top5_domains_per_category.csv		top5_domains_per_category.csv
varImp_alpha.pdf		varImp_alpha.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Application to Predict Political Attitudes from Web Browsing Histories

Machine Learning Methods: Dimentionality reduction, Linear regression, Random Forest and Elastic Net

Summary

About

Languages

norakirkizh/ml_politics

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Application to Predict Political Attitudes from Web Browsing Histories

Machine Learning Methods: Dimentionality reduction, Linear regression, Random Forest and Elastic Net

Summary

About

Topics

Resources

Stars

Watchers

Forks

Languages