A web app used to fetch ETF holdings data and visualize the similarity between ETFs.
A first version of the app is now live on Google Cloud Platform! Check it out there!
A second version of the app is now live on AWS! Check it out there, along with its associated REST and GraphQL APIs!
- Clone this repo using:
git clone https://github.com/srcoulombe/etf_comparer.git
- Navigate into the repository's directory and create a virtual environment by using:
python3 -m venv etfcomparer_venv
- Activate the virtual environment by using:
source etfcomparer_venv/bin/activate
if you're using MacOS/Linux or.\etfcomparer_venv\Scripts\activate
if you're a Windows user. - Install the dependencies into your virtual environment by using:
python3 -m pip install -r requirements.txt
- Run the app by using:
streamlit run etf_comparer.py
The frontend is built around streamlit
, streamlit-tags
, matplotlib
, and seaborn
.
The user can choose which database management system to use at runtime. The options are: sqlite3
, postgres
, and tinydb
.
I've been using this project to compare and contrast the SQL and NoSQL approaches.
Future work will include the development and deployment of more production-ready databases (specifically postgresql
and mongodb
).
Data scraping is done using the requests
library. Some sources and constants (urls, integer IDs, etc...) for the Invesco, iShares, and ARK scrapers were adapted from etf4u
, but those scrapers were refactored. I've also developed a general-purpose scraper targeting zacks.com
as a fallback option.
A first version of the app was deployed on Google Cloud Platform. This version only supports the small-scale and portable database management systems (sqlite3
and tinyDB
). It also lacks an associated REST/GraphQL API and prefetching capabilities.
A second version of the app was recently deployed on AWS. An AWS RDS
instance with automated backups, a read replica, and load balancer is used to host the postgres
database. The app itself is run on an AWS EC2
instance, along with the prefetch
script to automatically update the database's data every 24 hours. The AWS EC2
instance also hosts the REST and GraphQL(http://34.207.129.103:8887/graphql) APIs.
- ability to scrape additional sources
- [~] ability to track ETFs over time
- add functionality for db rollbacks
- add functionality for
postgresql
- add functionality for
mongodb
- re-deploy with the new database management systems
- add logging functionality
- add a "show raw data" option
- add a "download raw data" option
- add diagrams explaining database layouts
- add tab explaining distance measures
What would be REALLY useful is the ability to return the k
ETFs that are the most different from a collection of ETFs.
OSS like yahooquery provide similar though restricted functionality: they only return the ETFs' top 10 holdings.
Some existing tools can be used for similar purposes as this project, but focus on 1-vs-1 comparisons (etfdb, etfrc, etfanalytics). Other tools scale beyond head-to-head comparisons (Vanguard's FundCompare, TD Ameritrade), but don't do this type of overlap analysis. Some tools like fundvisualizer might be relevant, but they are behind paywalls so I wasn't able to look into them very much. I also came across other projects, some of which were... interesting to say the least.
Some projects like investpy
and the GitHub repo awesome-quant
are more pertinent. I'll be looking into . After having examined investpy
's ETF scraping capabilities to see what I could learn and useinvestpy
more thoroughly, it does not seem to be in line with this project's direction.
- etfchannel.com
- etfdb.com/tool/etf-comparison
- etfdb.com/screener
- https://www.portfoliovisualizer.com/faq