Skip to content

Scraping and analyzing graduate application statements.

Notifications You must be signed in to change notification settings

Rypo/grad-statement-analysis

Repository files navigation

grad-statement-analysis

This repository comprises the primary files used to gather and analyze 3,417 Statement of Purpose/Personal Statement/Letter of Intent style documents.

The documents were scraped from a public forum in which a prospective applicant (referred to as 'OP' throughout the analysis) posts their document for other users to review. Nearly all of the statements have at least one response, although some have many more. In total, 11,985 individual text documents were analyzed, consisting of ‘OP’ posts, ‘OP’ self-responses, and critiques.

File Descriptions

Notebooks

  • prelim_analysis.ipynb - The initial, general-purpose notebook used in the analysis. It features basic EDA, sentiment analysis, FastText and Doc2vec embeddings, and LDA, NMF, and LSA models. [nbviewer]
  • preprocessing.ipynb - Demonstrates the multiple approaches taken to preprocess the text into various forms to meet the needs of particular models. [nbviewer]
  • kpe_summarization.ipynb - Applies several forms of Key-phase Extraction and text summarization to user feedback and uses basic heuristics to find commonalities across the documents. [nbviewer]
  • exploration.ipynb - Supplemental exploratory data analysis that aims to answer questions tangential to main analysis motivations through data visualization [nbviewer]
  • lang_models.ipynb - Builds ULMFiT language models on the subsets of the corpus. [nbviewer]

Note: The nbextension Freeze was used liberally throughout each notebook. Without this, notebooks will likely not function sequentially.

Python Files

Scrapy Files

About

Scraping and analyzing graduate application statements.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published