Problem: huggingface download_model
only supports parallel download when the model is chunked.
GGML models can be quite large (30B+ especially) but chunking is not supported its always a single .bin file.
Solution: use pypdl library that implements multi-threaded downloading via dynamic chunking
- pypdl 😍
- huggingface_hub 🚀
- python-fire 🔥
- requests
pip install -r requirements.txt
./download.py <model> [--quant <quant>] [--branch <branch>]
<model>
is the model you're downloading for example TheBloke/vicuna-33B-GGML
<quant>
is the quantization you're downloading for example q5_0
(default is *
which will download all files)
<branch>
is optional, if omitted will download from first avilable branch
from download import download_model
and call download_model(model_name : str, quant : str = "*")
-
Import the helper functions:
from download import get_filenames, build_url, get_redirect_header, parallel_download
-
Get the branch and filename of the quant you're looking for:
get_filenames(model_name, quant)
returns a(branch, filename)
iterator -
Build the HF download URL:
build_url(model_name, branch, filename)
returnsurl
-
Get the LFS URL:
get_redirect_header(url)
returnslfs_url
-
Download the file:
parallel_download(lfs_url, filename)
will createfilename
in the current directory