Skip to content

srcoulombe/etf_comparer

Repository files navigation

ETF Analyzer

A web app used to fetch ETF holdings data and visualize the similarity between ETFs.

A first version of the app is now live on Google Cloud Platform! Check it out there!

A second version of the app is now live on AWS! Check it out there, along with its associated REST and GraphQL APIs!

Quickstart

  1. Clone this repo using: git clone https://github.com/srcoulombe/etf_comparer.git
  2. Navigate into the repository's directory and create a virtual environment by using: python3 -m venv etfcomparer_venv
  3. Activate the virtual environment by using: source etfcomparer_venv/bin/activate if you're using MacOS/Linux or .\etfcomparer_venv\Scripts\activate if you're a Windows user.
  4. Install the dependencies into your virtual environment by using: python3 -m pip install -r requirements.txt
  5. Run the app by using: streamlit run etf_comparer.py

Architecture

Frontend

The frontend is built around streamlit, streamlit-tags, matplotlib, and seaborn.

Backend

The user can choose which database management system to use at runtime. The options are: sqlite3, postgres, and tinydb. I've been using this project to compare and contrast the SQL and NoSQL approaches. Future work will include the development and deployment of more production-ready databases (specifically postgresql and mongodb). Data scraping is done using the requests library. Some sources and constants (urls, integer IDs, etc...) for the Invesco, iShares, and ARK scrapers were adapted from etf4u, but those scrapers were refactored. I've also developed a general-purpose scraper targeting zacks.com as a fallback option.

Deployment

A first version of the app was deployed on Google Cloud Platform. This version only supports the small-scale and portable database management systems (sqlite3 and tinyDB). It also lacks an associated REST/GraphQL API and prefetching capabilities.

A second version of the app was recently deployed on AWS. An AWS RDS instance with automated backups, a read replica, and load balancer is used to host the postgres database. The app itself is run on an AWS EC2 instance, along with the prefetch script to automatically update the database's data every 24 hours. The AWS EC2 instance also hosts the REST and GraphQL(http://34.207.129.103:8887/graphql) APIs.

TODO

Development

  • ability to scrape additional sources
  • [~] ability to track ETFs over time
  • add functionality for db rollbacks
  • add functionality for postgresql
  • add functionality for mongodb
  • re-deploy with the new database management systems
  • add logging functionality
  • add a "show raw data" option
  • add a "download raw data" option
  • add diagrams explaining database layouts
  • add tab explaining distance measures

Extension

What would be REALLY useful is the ability to return the k ETFs that are the most different from a collection of ETFs.

Journal

OSS like yahooquery provide similar though restricted functionality: they only return the ETFs' top 10 holdings.

Some existing tools can be used for similar purposes as this project, but focus on 1-vs-1 comparisons (etfdb, etfrc, etfanalytics). Other tools scale beyond head-to-head comparisons (Vanguard's FundCompare, TD Ameritrade), but don't do this type of overlap analysis. Some tools like fundvisualizer might be relevant, but they are behind paywalls so I wasn't able to look into them very much. I also came across other projects, some of which were... interesting to say the least.

Some projects like investpy and the GitHub repo awesome-quant are more pertinent. I'll be looking into investpy's ETF scraping capabilities to see what I could learn and use. After having examined investpy more thoroughly, it does not seem to be in line with this project's direction.

Sources to look into:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages