SVIP

Think your LLM inference is slow? Try SVIP💳, a plug-and-play, training free dynamic draft length control policy for any speculative decoding system with an auto-regressive draft model.

Vanilla speculative decoding on SpecBench (temperature set to 0):

Draft length comparison (Qwen2.5 on top, LLaMA-3 at the bottom):

Long-form generation on MT-Bench (temperature set to 1):

GliDe with a CaPE plus SVIP:

EAGLE-2 plus SVIP:

Evaluation

Vanilla SD

A streamlined version of speculative decoding is given in vanilla-sd, with three baselines implemented besides SVIP:

target-model-only inference: pass an empty string as the draft_model argument.
constant draft length: pass 'constant' as length_policy, and set draft length with the draft_length argument
heuristics (from Hugging Face): this policy increases draft length by 2 if all draft tokens are accepted in the current round, and otherwise decreases it by 1. Pass 'heuristics' as length_policy. The draft_length argument will set the initial draft length.

For all length policies, two decoding stratigies are available: greedy decoding and temperature sampling with temperature = 1. Change the sample argument to switch between them.

Note that our experiments are run on A100 GPUs with 40GB memory. If you use other types of GPUs, please modify the memory_first_gpu and memory_per_gpu arguments accordingly. memory_first_gpu specifies the memory reserved for the target model on the frst GPU, while memory_per_gpu specifies the memory reserved on each of the remaining GPUs. Draft model is placed on the first GPU, so please take that into consideration when setting memory_first_gpu. These two arguments are only used if more than one GPU is used.

GliDe with a CaPE

Coming soon.

EAGLE-2

Coming soon.

Reference

@misc{zhang2024draftmodelknowsstop,
      title={Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding},
      author={Ziyin Zhang and Jiahao Xu and Tian Liang and Xingyu Chen and Zhiwei He and Rui Wang and Zhaopeng Tu},
      year={2024},
      eprint={2411.18462},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.18462},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
imgs		imgs
vanilla-sd		vanilla-sd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVIP

Evaluation

Vanilla SD

GliDe with a CaPE

EAGLE-2

Reference

About

Languages

License

Geralt-Targaryen/SVIP

Folders and files

Latest commit

History

Repository files navigation

SVIP

Evaluation

Vanilla SD

GliDe with a CaPE

EAGLE-2

Reference

About

Resources

License

Stars

Watchers

Forks

Languages