Analyzing Stock SEC Filings

Hypothesis: 10Ks are long and boring to read, but contain valuable information about the kinds of risks a company is facing. Research (and practical experience) suggests that stock price is correlated with the year over year change in 10K language and with negative sentiment increase in the 10K. Natural Language Processing is the study of using computational tools to understand bodies of language. The aim is to use NLP to analyze 10Ks and find possible arbitrage opportunities in the market.

Along the way, we provide some functionality for private investors like my dad (who inspired this project to begin with).

research: file containing some preliminary notes and resources for initial research purposes.
data_scrape_notebooks: This file contains Jupyter Notebooks that gather and clean data from the SEC's website. Most of the code (save a few edits) come from this fantastic source. Also here is a python script where I used a lot of the code from the notebook to automate the data collection and to spit out a pdf of the similarity scores for each ticker of interest.
similarity_analysis.py: using the data scraping notebooks, I put all the relevant functions in a giant script to automate the data collection/cleaning process for tickers of interest. The similarity_analysis_windows.py script is my attempt to make this script usable on my dad's vanilla PC. If you like reading about programmers in distress, read trials_tribulations.md to see my account of that process.
data: Using the script above, I collected and cleaned data for three different companies: Google, Goldman Sachs, and Tesla. Also here are the outputs of the script (the pdf files showing similarity values).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Stock SEC Filings

Contents

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Research		Research
data		data
data_scrape_notebooks		data_scrape_notebooks
.gitignore		.gitignore
README.md		README.md
similarity_analysis.py		similarity_analysis.py
similarity_analysis_windows.py		similarity_analysis_windows.py
spacy_test.ipynb		spacy_test.ipynb
trials_tribulations.md		trials_tribulations.md

ruthlee/10K_analysis

Folders and files

Latest commit

History

Repository files navigation

Analyzing Stock SEC Filings

Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages