GitHub - meanmodemoda/msdv-state-of-the-union: A weekly assignment on word embeddings, visualizing textual data at the Data Vis & Info Aesthetics Class, Parsons School of Design, Fall 2021

Visualize Textual and Qualitative Data using State of the Union Addresses

Summary

This is a weekly assignment of visualizing textual data at the Data Vis & Info Aesthetics class. I touched it up and fixed some remaining bugs after submitting the assignment.

Process

I chose to analyze the past 10 years State of the Union addresses to Congress. As a foreigner, I decided to de-politicize my analysis and focus on American values from an outsider's perspective. I wanted to compare each speech to the Declaration of Independence, most particularly, how the keywords/American values of "American", "Equality", "Life", "Liberty" and "Happiness" were reflected in these speeches.

Text Processing

I pre-processed the corpus using normalization and tokenization methods and then I used the Gensim Word2Vec model to train and output top most similar words related to the abovementioned keywards in each speech.

For details of text preprocessing, see here. After that, I mannually combed through the returned keywords and when I noticed where the keywords made little sense, I went back to the corpus and made revisions to my pre-processing procedures to improve the models.

After gathering the improved keywords, I mannually tagged them with seven major themes such as "economy", "humanity" or "science & tech" to have a more aggregated view on them.

Visualization

I chose d3.js bubble chart for my visualization. The size of the bubbles indicates the similarity score. However, in my opinion, the similarity score values don't hold significant value so I did not sort or arrange the bubble chart based on similarity score values, rather, I have them somewhat randomized.

Outcome

I discovered that the model has a better outcome when it comes to Trump's speeches. They seem to be more digestable and relatable even from a model output perspective. Trump was the only one who mentioned "happy" or "happiness" in the past 10 years' SOTU addresses. A speech is not what you want to convey but what the audience perceives. This mini analysis is far from adequate to draw any conclusion but I think it invites more questions and interests in Trump's speeches and the effectiness of his communication style. An interesting read here on Trump's use of language that focuses on "repetition", "intensifiers" and "directness" seems to be validated by the model output.

Limitations

This is a weekly assignment and I was very constrained by time. Should time allow, I would have spent more time applying more sophisticated methods in my text pre-processing such as incorporating lemmatization on the training corpus. The d3.js transition animation or the lack of it is also a bit buggy.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
data		data
README.md		README.md
app.js		app.js
index.html		index.html
screenshot.png		screenshot.png
social_template.png		social_template.png
sotu_most_similar.csv		sotu_most_similar.csv
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualize Textual and Qualitative Data using State of the Union Addresses

Summary

Process

Text Processing

Visualization

Outcome

Limitations

About

Releases

Packages

Languages

meanmodemoda/msdv-state-of-the-union

Folders and files

Latest commit

History

Repository files navigation

Visualize Textual and Qualitative Data using State of the Union Addresses

Summary

Process

Text Processing

Visualization

Outcome

Limitations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages