Skip to content

Repo for the SISU Digital Humanities Workshop, May/June 2021

Notifications You must be signed in to change notification settings

tomvannuenen/SISU-DH-2021

Repository files navigation

SISU Workshop 2021

Digital Humanities: Textual and Language Analysis on Social Media

About this repo

This repo contains a number of notebooks for the 2021 Digital Humanities course at SISU in May - June 2021, which investigates the possibilities and pitfalls of computational text research for humanities students.

Course Description

For centuries, the humanities has operated through the close reading of cultural objects: reading to uncover layers of meaning that lead to deep comprehension. Such ‘close’ approaches are increasingly replaced by ‘distant’ methods that rely on programmatic modeling and corpus linguistics. This allows researchers to focus on units that are much smaller or much larger than the singular case study, text, or author – words, topics, genres, themes, and so on. Close and distant reading are especially relevant in a context of social media, which are marked by the spread of disinformation – captured in terms such as post-truth, filter bubbles, and clickbait. The visibility of online content often seems to be informed more by virality and controversy than by truthfulness and dialogue. How can we understand the large quantities of social data on online platforms in order to reveal ideologies, biases, and controversies? In this course, we will engage with social media data in order to uncover such patterns of meaning-making. Using a variety of strategies of textual and data analysis (e.g. tf-idf, topic modeling and word embeddings), students will learn to apply and critically reflect on corpus linguistics with a critical and explorative mindset. We will focus on the discursive ways in which facts and opinions are negotiated within communities, and the patterns and biases that appear in natural language.

Learning Outcomes

This course will realize the following learning outcomes:

  • Attain knowledge and understanding of the epistemological potentials and pitfalls of several popular quantitative approaches to text analysis.
  • Apply textual and language analysis methods to contemporary datasets taken from social media.
  • Demonstrate an awareness of the norms and presuppositions in quantitative methodological frameworks.
  • Applying quantitative methods from the Digital Humanities with a critical and explorative mindset.

Course Outline

Session 1: Introduction

Introduction to Jupyter Notebooks, class repositories; working through some programming fundamentals in Python.

Session 2: Preprocessing & tfidf

Exploring basic operations on Pandas DataFrames when dealing with social data. Preprocessing data and comparing datasets using tfidf.

Session 3: Distant Reading

Introduction to distant reading using NLTK and Pandas.

Session 4: Topic Modeling

Exploring topic modeling as one way to move beyond the author and explore discursive patterns in our data. Using topic modeling findings to engage in a close reading.

Session 5: Word Embeddings

Introducing Word Embeddings through Word2Vec in Python. Critical discussion about the concerns of bias implicit in Word Embeddings models.

Session 6: Language Biases

Exploring how to analyze language biases using Word Embeddings methods.

Note that there are two optional notebooks for those who are interested: one on linear regression, and one on Naive Bayes classification and sentiment analysis.

About

Repo for the SISU Digital Humanities Workshop, May/June 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published