Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support vllm backend #39

Closed
Isotr0py opened this issue Jan 13, 2024 · 1 comment
Closed

[Feature Request] Support vllm backend #39

Isotr0py opened this issue Jan 13, 2024 · 1 comment
Labels
enhancement New feature or request model This issue is about Sakura model

Comments

@Isotr0py
Copy link
Contributor

Describe the solution you'd like

项目地址:vllm

vllm推理后端支持Baichuan和Qwen系列模型,支持以下推理加速:

  • GPTQ/AWQ量化
  • PagedAttention
  • Tensor parallelism

粗略测试 (GPU: T4*2):在fp16精度下,tensor_parallel_size=2时,Sakura-7B生成速度大约是Transformers (device_map="auto") 后端的两倍

@sakura-umi sakura-umi added enhancement New feature or request model This issue is about Sakura model labels Jan 13, 2024
@Isotr0py Isotr0py mentioned this issue Jan 14, 2024
4 tasks
@Isotr0py
Copy link
Contributor Author

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model This issue is about Sakura model
Projects
None yet
Development

No branches or pull requests

2 participants