This is a blazing fast python script to download almost all yahoo symbols.
pip install git+https://github.com/legout/yahoo-symbols.git
python -m yahoo_symbols.download --max-combination-length=2 --types=equity,etf --output=./database --output-type=parquet
Usage: download.py [OPTIONS]
Options:
--max-combination-length INTEGER The maximum length of combinations to search for.
Higher numbers may result in more results at cost of longer download times.
[default: 2]
--types TEXT Choose one or several types.
Available types are `equity, mutualfund, etf, index, future, currency, cryptocurrency`
[default: equity]
--random-proxy / --use-random-proxy
Use a random proxy for each request. Currently only proxies from webshare are supported.
[default: no-random-proxy]
--verbose / --no-verbose Wheter to show a progressbar or not. [default: verbose]
-validation /--no-validation Run a finally validation of the downloaded symbols. [default: validate]
--output TEXT The output path where the downloaded symbols are saved to. [default: ./db]
--output-type TEXT Defines the output type. Options are `parquet`, `csv` or `sqlite3`. [default: parquet]
--help Show this message and exit.
The benchmarks of this script for one asset type are (tested for type equity
):
max query length | 1 | 2 | 3 | 4 |
---|---|---|---|---|
number of requests | 38 | 1482 | 56354 | 2141490 |
estimated download duration* | ~ 3s | ~1min | ~10min | ~3h |
- You´ll get the best results (most unique symbols) from the symbol downloads if you run this script seperatly for each type (equity, etf,...).
- The option
--max-query-length
should be2
or3
.
Note This script should work fine without using random proxies.
When using the option --use-random-proxy
free proxies* are used. In my experience, these proxies are not reliable, but maybe you are lucky.
I am using proxies from webshare.io. I am very happy with their service and the pricing. If you wanna use their service too, sign up (use the this link if you wanna support my work) and choose a plan that fits your needs. In the next step, go to Dashboard -> Proxy -> List -> Download and copy the download link. Set this download link as an environment variable WEBSHARE_PROXIES_URL
before running the download script.
Export WEBSHARE_PROXIES_URL in your linux shell
$ export WEBSHARE_PROXIES_URL="https://proxy.webshare.io/api/v2/proxy/list/download/abcdefg1234567/-/any/username/direct/-/"
You can also set this environment variable permanently in an .env
file (see the .env-exmaple
) in your home folder or current folder or in your command line config file (e.g. ~/.bashrc
).
Write WEBSHARE_PROXIES_URL into .env
WEBSHARE_PROXIES_URL="https://proxy.webshare.io/api/v2/proxy/list/download/abcdefg1234567/-/any/username/direct/-/"
or write WEBSHARE_PROXIES_URL into your shell config file (e.g. ~/.bashrc)
$ echo 'export WEBSHARE_PROXIES_URL="https://proxy.webshare.io/api/v2/proxy/list/download/abcdefg1234567/-/any/username/direct/-/"' >> ~/.bashrc
*Free Proxies are scraped from here:
- "http://www.free-proxy-list.net"
- "https://free-proxy-list.net/anonymous-proxy.html"
- "https://www.us-proxy.org/"
- "https://free-proxy-list.net/uk-proxy.html"
- "https://www.sslproxies.org/"
If you find this useful, you can buy me a coffee. Thanks!