Sentiment analysis of SEC filings based on the lazy prices paper (link here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1658471). According to the paper, alpha is generated by shorting companies that make an active change in their reporting practices and buying "non-changes".
This repo implements this paper in an automated way.
As an aside, this repo contains a ETL pipeline to download and clean SEC filings of your choice for natural language processing. You can use that code to process SEC filings and do your own analysis.
code: code to implement the lazy prices paper
get_sec_filings_df.ipynb - code to download raw SEC filings
clean_and_filter_data.ipynb - clean SEC filings, with implementation taken from this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2870309
calc_doc_similarity.ipynb - Preprocesses filing to calculate the document similarity of the latest filing vs the previous filing of the same kind. E.g. Q3 10Q 2018 compared against Q3 10Q 2017. Filings are also processed with stopwords taken from LoughranMcDonald Master Dictionary
data: contains cik ticker list, which is used to find CIK number by company name
master-dict: contains LoughranMcDonald Master Dictionary and documentation
sec-filings-downloaded: contains downloaded SEC filings, with each company having its own folder. The processed cleaned filings are stored in a sub folder named "cleaned_filings"
sec-filings-index: index of all SEC filings which is used to download the actual filings
-
Change "ProjectDirectory.py" file to point to your own directory
-
Run "get_sec_filings_df.ipynb" to download raw SEC filings
-
Run "clean_and_filter_data.ipynb" to clean raw filings and input them in the appropriate folders
-
Run "calc_doc_similarity.ipynb" to process the cleaned data (exclude stopwords, stem if you want), and calculate YoY document similarity for each company
- edgar: https://pypi.org/project/edgar/
- pathlib2
- tqdm
- nltk
- sklearn