Skip to content

Latest commit

 

History

History
375 lines (289 loc) · 34.3 KB

README.md

File metadata and controls

375 lines (289 loc) · 34.3 KB

Tools for media and communication research

A list of digital tools and resources for journalism, media and communication research, and computational social science.

Table of contents

Find datasets

Search engines:

Archives and lists:

Survey data:

Media data:

Content analysis, text analysis, text mining, annotation

Compare differences between texts, find duplicate files

Television

Social networking sites and specific sites

Facebook

Twitter

  • twarc - command line tool for archiving Twitter JSON (Python).
  • tweetbotornot - detect Twitter bots via machine learning (R).
  • Twint - Twitter scraping and open source intelligens (OSINT) tool that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations (Python).
  • Tinfoleak (GitHub) - open-source tool for Twitter intelligence analysis (Python).
  • Tweetbeaver - convert @name to ID, check if two accounts follow each other, download a user's favorites, search within a user's favorites, download a user's timeline etc (online tool).
  • scrape-twitter - Command line interfaces to scrape profiles, timelines, connections, likes, search and conversations with the use of API (Node.js).
  • Chorus - free Twitter harvesting and visual analytics suite for social science research (Windows).
  • Twitter API.
  • Twitter - helpful tools - Twitter lists helpful tools for data access, data analysis, data visualization, and hosting.

Wikipedia

Google

  • Google Trends - number of searches for a specific search query (online tool).
  • Google Ngram Viewer - shows number of times a phrase have occurred in a books from year 1800 to 2000 (online tool).
  • GoogleScraper - scrape search engines (e.g., Google, Yandex, Bing, Duckduckgo, Baidu) by using proxies (socks4/5, http proxy) and many IP's, including asynchronous networking support (Python).
  • Lumen database - collects and analyzes legal complaints and requests for removal of online materials such as Google search results (online tool).
  • Google Books - search for books (online API).
  • YouTube comment scraper - scrape comments from YouTube videos, download comments as JSON or CSV (online tool).
  • youtube-dl - download YouTube videos or videos from other sites (Python).

Reddit

  • PRAW - Library for API access to Reddit (Python).
  • RedditExtractoR - Package for API access to Reddit (R).

Misc sites

Scrape and extract news articles

  • GDELT Project - archives all news media events around the globe.
  • The Social, Political and Economic Event Database Project (SPEED) - comprehensive news sources from 1945 onwards, crawls over 5,000 news feeds in 120 countries several times each day, scraping news reports, totalling over 40 million news reports.
  • mediacloud - open source, open data platform that allows researchers to answer quantitative questions about the content of online media (Perl/Python).
  • Trove - Find and get Australian and online resources: books, images, historic newspapers, maps, music, archives and more.
  • newsdiffs - automatic scraper that tracks changes in news articles over time (Python).
  • newsflash - tools to work with the Internet Archive and GDELT Television Explorer (R).
  • newspaper - news, full-text, and article metadata extraction, based on python-goose (Python).
  • news-please - integrated web crawler and information extractor for news that just works (Python).
  • Newsmap - semi-supervised geographical news classifier (R). Journal article.
  • Scrapy - fast high-level web crawling and web scraping framework (Python).

Online archives and archiving

  • Internet Archive - non-profit digital library offering free universal access to books, movies & music (online tool).
  • Wayback Machine - 343 billion archived web pages from Internet Archive (online tool).
  • archive.is - take a snapshot of a webpage that will always be online even if the original page disappears (online tool).
  • HTTrack Website Copier - download website to a local directory, building recursively all directories, getting HTML, images, and other files with the original site's relative link-structure (Windows/Mac/Linux).
  • ArchiveBox - self-hosted internet archiving solution to collect, save, and view sites you want to preserve offline (self-hosted/Docker).

Journal articles, citations, bibliometrics

  • journal-spider - tools to spider journal websites for links to articles (Python).
  • Claim Extraction for Scientific Publications - detect claims (e.g. "background", "conclusion") from scientific publication using discourse and sentence embedding (Python).
  • scholar.py - parser for Google Scholar (Python).
  • OpenCitations - search and browse OpenCitations Corpus (OCC) of open downloadable bibliographic and citation data recorded in RDF (online API).
  • metaknowledge - computational research in bibliometrics, scientometrics, and network analysis (Python).
  • pdfx - extract references (pdf, url, doi, arxiv) and metadata from a PDF. Download all referenced PDFs (Python).
  • Publish or Perish - retrieves and analyzes academic citations from Google Scholar and Microsoft Academic Search (Win/Mac/Linux).
  • VOSViewer - software for constructing and visualizing bibliometric networks of journals, researchers, or publications based on citation, bibliographic coupling, co-citation, or co-authorship + text mining (Win/Mac).
  • CitNetExplorer - software for visualizing and analyzing citation networks of scientific publications, import from Web of Science (Win/Mac).
  • metagear - research synthesis tools for systematic reviews and meta-analysis with data extraction (R).
  • revtools - R package to conduct literature review or meta-analysis, visualise patterns in bibliographic data, select/exclude articles or words, etc (R).
  • litsearchr - quick, objective, reproducible search strategy development using text-mining and keyword co-occurrence networks to identify important terms (R). Journal article.
  • Rayyan - uses AI/NLP to speed you through systematic reviews (Android/iOS).
  • OpenAlex - open and comprehensive catalog of scholarly papers, authors, institutions, and more (free API).

Literature search

  • Semantic scholar - AI-powered research tool for scientific literature.
  • Connected papers - explore connected papers in a visual graph.
  • CoCites - citation-based method for searching scientific literature, lets you find out who else published on a topic.
  • Scite - discover supporting and contrasting evidence for papers by extracting the words around references.
  • ASReview LAB - AI-powered app that help screen texts for systematic reviewing (Win/Mac).
  • Problematic Paper Screener - finds papers with tortured phrases, text generated by machines (online).

Find retracted articles

Behavioral and cognitive experiments

  • jsPsych (GitHub) - library for creating and running behavioral experiments in a web browser (JavaScript).
  • OpenSeasme - graphical, open-source experiment builder for social sciences. Build complex experiments with minimal effort, create a wide range of experiments. Plug-in framework and Python scripting allows you to incorporate external devices, such as eye trackers, response boxes, and parallel port devices (Windows/Mac/Linux).
  • PlanOut - framework for online field experiments. Makes it easy to run and iterate on sophisticated experiments in a statistically sound manner while satisfying the constraints of deployed Internet services (Python/JS/Java/PHP/Go/Lua/Ruby).
  • PsychoPy - allow presentation of stimuli and collection of data for a wide range of neuroscience, psychology and psychophysics experiments (Python).
  • WebExp - system for conducting psychological experiments over the web (Java).
  • conjoint-example - example conjoint experimental design in Qualtrics.
  • PsyToolkit - free toolkit for demonstrating, programming, and running cognitive-psychological experiments and surveys, including personality tests (online tool).
  • Empirica - framework for running multiplayer interactive experiments and games in the browser (JavaScript).
  • Gorilla - creates and hosts online experiments with easy-to-use graphical interface, no coding necessary (pay per respondent).
  • oTree - open-source platform for behavioral research.
  • Elgg - not a research tool, but open source social networking engine with core components to build a social networking site (PHP).

Graphics, network visualizations and maps

  • Dia - app to draw structured diagrams (Windows/Mac/Linux).
  • Gephi - visualization and exploration app for all kinds of graphs and networks (Windows/Mac/Linux).
  • QGIS - free and open source geographic information system app (Windows/Mac/Linux).
  • Inkscape - free and open source desktop program to draw vector graphics like Adobe Illustrator (Windows/Mac/Linux).
  • From Data to Viz - leads you to the most appropriate graph for your data, links to the code to build it and lists common caveats you should avoid (online tool).
  • Chart Types - tutorials, guides, and examples for all of the major graphs and some others.
  • Diagrams.net (previously draw.io) - draw flowcharts, diagrams, charts, sequences etc (online tool).
  • Dagitty - create, edit or analyze causal diagrams or DAGs (online tool).

Convert and clean data

  • OpenRefine - powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data, formerly called Google Refine (Windows/Mac/Linux).
  • Mr. Data Converter - convert Excel data into web-friendly formats such as HTML, JSON and XML (online tool).
  • PSPP File Conversion Service - convert SPSS (.sav) files to CSV or text format (online tool).

Survey scales and measures

Survey software

  • qualtRics - Download and import qualtrics survey data directly (R).

Statistics and questionable research practices (QRP's)

  • GRIM Test checks if the reported means match with number of items and type of scale (see also GRIMMER test).
  • SPRITE check the type of distributions that could have produced the reported descriptive statistics.
  • statcheck checks if p-values match reported statistics.
  • Test of Insufficient Variance (TIVA) checks wether reported p-values was obtained using questionable research practices.
  • zcurve - estimates mean power after selection for significance, see blogpost and journal article (R).
  • P-hacker - train your p-hacking skills to achieve p < 0.05 (online app/Shiny).

Statistical software

Programs such as SPSS, Stata, SAS, and Comprehensive Meta analysis costs money.

Free data analysis and exploration software:

Convert effect sizes:

Power analysis:

Plan statistical design:

Organize photos, citations and references

  • EndNote - bibliography reference manager.
  • JabRef - bibliography reference manager using BibTeX (free).
  • Mendeley - bibliography reference manager (free).
  • ReadCube Papers - bibliography reference manager.
  • Zotero - bibliography reference manager (free).
  • Tropy - organize research photos and images (free).
  • Seshat annotation manager - automated management of annotation campaigns of speech data (Docker).

Education

  • Data Journalism Courses - list of data journalism courses and programmes from universities and higher education institutions around the world.
  • SICCS Learning Materials - open source teaching and learning resources for computational social science.

Organizations

  • Digital Methods Initiative (GitHub) - contribution to doing research into the "natively digital".
  • DocNow - tool/community that supports ethical collection, use, and preservation of social media content.

Open science, preregistration, code/data sharing

Text, writing

  • QuillBot - paraphrasing tool helps rewrite your text and enhance sentences, paragraphs using AI (online).

Humor

  • Break your own news - Breaking News Meme Generator - Add your pic, write the headline and generate a screenshot of a breaking news story.
  • FOAAS (Fuck Off As A Service) - a modern, RESTful, scalable solution to the common problem of telling people to fuck off.

See also

More lists:

Tutorials:

Literature:

  • Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (in press). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, 1–24. https://doi.org/10.1017/pan.2020.8
  • Boumans, J. W., & Trilling, D. (2016). Taking Stock of the Toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. https://doi.org/10.1080/21670811.2015.1096598
  • Burscher, B., Odijk, D., Vliegenthart, R., de Rijke, M., & de Vreese, C. H. (2014). Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis. Communication Methods and Measures, 8(3), 190–206. https://doi.org/10.1080/19312458.2014.937527
  • Freelon, D. (2015). On the cutting edge of Big Data: Digital politics research in the social computing literature. In S. Coleman & D. Freelon (Eds.), Handbook of Digital Politics (p. 448). Northampton, MA: Edward Elgar.
  • Jacobi, C., van Atteveldt, W., & Welbers, K. (2016). Quantitative analysis of large amounts of journalistic texts using topic modelling. Digital Journalism, 4(1), 89–106. https://doi.org/10.1080/21670811.2015.1093271
  • Lazer, D., & Radford, J. (2017). Data ex Machina: Introduction to Big Data. Annual Review of Sociology, 43(1). https://doi.org/10.1146/annurev-soc-060116-053457
  • Wilkerson, J., & Casas, A. (2017). Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science, 20(1), 529–544. https://doi.org/10.1146/annurev-polisci-052615-025542

Podcast episodes:

Facebook groups: