This repository contains code to download and parse movie screenplays from imsdb.com and dailyscripts.com. It was created for the Media Informatics and Content Analytics (sail.usc.edu/~mica/) group of USC SAIL.
If you use this code, please cite the paper: Linguistic analysis of differences in portrayal of movie characters, in: Proceedings of Association for Computational Linguistics, Vancouver, Canada, 2017
.
For comments, bug reports, etc. contact akramakr@usc.edu
parse_scripts_noindent.py
is the updated parser.
See help message by running python parse_scripts_noindent.py --help