New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Feature Request] Support vllm backend #39

Closed

Isotr0py opened this issue Jan 13, 2024 · 1 comment

Labels

enhancement model

Contributor

Isotr0py commented Jan 13, 2024

Describe the solution you'd like

项目地址：vllm

vllm推理后端支持Baichuan和Qwen系列模型，支持以下推理加速：

GPTQ/AWQ量化
PagedAttention
Tensor parallelism

粗略测试 (GPU: T4*2)：在fp16精度下，tensor_parallel_size=2时，Sakura-7B生成速度大约是Transformers (device_map="auto") 后端的两倍

The text was updated successfully, but these errors were encountered:

sakura-umi added enhancement model labels

Isotr0py mentioned this issue

Support vllm engine. #40

Merged

4 tasks

Contributor Author

Isotr0py commented Jan 17, 2024

Done.

Isotr0py closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment