ggml-downloader

Problem: huggingface download_model only supports parallel download when the model is chunked.

GGML models can be quite large (30B+ especially) but chunking is not supported its always a single .bin file.

Solution: use pypdl library that implements multi-threaded downloading via dynamic chunking

Requirements

pip install -r requirements.txt

./download.py <model> [--quant <quant>] [--branch <branch>]

<model> is the model you're downloading for example TheBloke/vicuna-33B-GGML

<quant> is the quantization you're downloading for example q5_0 (default is * which will download all files)

<branch> is optional, if omitted will download from first avilable branch

from download import download_model and call download_model(model_name : str, quant : str = "*")

Import the helper functions: from download import get_filenames, build_url, get_redirect_header, parallel_download
Get the branch and filename of the quant you're looking for: get_filenames(model_name, quant) returns a (branch, filename) iterator
Build the HF download URL: build_url(model_name, branch, filename) returns url
Get the LFS URL: get_redirect_header(url) returns lfs_url
Download the file: parallel_download(lfs_url, filename) will create filename in the current directory

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
download.py		download.py
requirements.txt		requirements.txt