SISU Workshop 2021

Digital Humanities: Textual and Language Analysis on Social Media

Instructor: Tom van Nuenen
Email: tom.van_nuenen@kcl.ac.uk

About this repo

This repo contains a number of notebooks for the 2021 Digital Humanities course at SISU in May - June 2021, which investigates the possibilities and pitfalls of computational text research for humanities students.

Course Description

For centuries, the humanities has operated through the close reading of cultural objects: reading to uncover layers of meaning that lead to deep comprehension. Such ‘close’ approaches are increasingly replaced by ‘distant’ methods that rely on programmatic modeling and corpus linguistics. This allows researchers to focus on units that are much smaller or much larger than the singular case study, text, or author – words, topics, genres, themes, and so on. Close and distant reading are especially relevant in a context of social media, which are marked by the spread of disinformation – captured in terms such as post-truth, filter bubbles, and clickbait. The visibility of online content often seems to be informed more by virality and controversy than by truthfulness and dialogue. How can we understand the large quantities of social data on online platforms in order to reveal ideologies, biases, and controversies? In this course, we will engage with social media data in order to uncover such patterns of meaning-making. Using a variety of strategies of textual and data analysis (e.g. tf-idf, topic modeling and word embeddings), students will learn to apply and critically reflect on corpus linguistics with a critical and explorative mindset. We will focus on the discursive ways in which facts and opinions are negotiated within communities, and the patterns and biases that appear in natural language.

Learning Outcomes

This course will realize the following learning outcomes:

Attain knowledge and understanding of the epistemological potentials and pitfalls of several popular quantitative approaches to text analysis.
Apply textual and language analysis methods to contemporary datasets taken from social media.
Demonstrate an awareness of the norms and presuppositions in quantitative methodological frameworks.
Applying quantitative methods from the Digital Humanities with a critical and explorative mindset.

Course Outline

Session 1: Introduction

Introduction to Jupyter Notebooks, class repositories; working through some programming fundamentals in Python.

Session 2: Preprocessing & tfidf

Exploring basic operations on Pandas DataFrames when dealing with social data. Preprocessing data and comparing datasets using tfidf.

Session 3: Distant Reading

Introduction to distant reading using NLTK and Pandas.

Session 4: Topic Modeling

Exploring topic modeling as one way to move beyond the author and explore discursive patterns in our data. Using topic modeling findings to engage in a close reading.

Session 5: Word Embeddings

Introducing Word Embeddings through Word2Vec in Python. Critical discussion about the concerns of bias implicit in Word Embeddings models.

Session 6: Language Biases

Exploring how to analyze language biases using Word Embeddings methods.

Note that there are two optional notebooks for those who are interested: one on linear regression, and one on Naive Bayes classification and sentiment analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
1 Introduction.ipynb		1 Introduction.ipynb
2 Preprocessing and tf-idf.ipynb		2 Preprocessing and tf-idf.ipynb
3 Distant reading.ipynb		3 Distant reading.ipynb
4 Topic modeling.ipynb		4 Topic modeling.ipynb
5 Word Embeddings.ipynb		5 Word Embeddings.ipynb
6 Language Biases.ipynb		6 Language Biases.ipynb
ADD Classification.ipynb		ADD Classification.ipynb
ADD Regression.ipynb		ADD Regression.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SISU Workshop 2021

Digital Humanities: Textual and Language Analysis on Social Media

About this repo

Course Description

Learning Outcomes

Course Outline

Session 1: Introduction

Session 2: Preprocessing & tfidf

Session 3: Distant Reading

Session 4: Topic Modeling

Session 5: Word Embeddings

Session 6: Language Biases

About

Releases

Packages

Languages

tomvannuenen/SISU-DH-2021

Folders and files

Latest commit

History

Repository files navigation

SISU Workshop 2021

Digital Humanities: Textual and Language Analysis on Social Media

About this repo

Course Description

Learning Outcomes

Course Outline

Session 1: Introduction

Session 2: Preprocessing & tfidf

Session 3: Distant Reading

Session 4: Topic Modeling

Session 5: Word Embeddings

Session 6: Language Biases

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages