mit-han-lab / qserve Public

Notifications
Fork 28
Star 478

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: mit-han-lab/qserve

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32 Open 9 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Some confuse about package, such as VILA

#52 opened Jan 14, 2025 by GCQi

Having trouble understanding the code.

#51 opened Jan 2, 2025 by DavidZyy

QServe supports Qwen2.5 with TensorRT-LLM

#50 opened Jan 2, 2025 by limertang

ModuleNotFoundError: No module named 'qserve_backend'

#49 opened Dec 30, 2024 by rakshit2020

A question about the parameter “–group-size” in qserve_benchmark.py

#48 opened Dec 16, 2024 by oasis-Linmi

qserve with tensorrt-llm is slower and awq int4 for llama2-7b

#46 opened Nov 28, 2024 by anaivebird

Does openai compatible server supported?

#43 opened Oct 31, 2024 by anaivebird

How to test the accuracy?

#42 opened Oct 30, 2024 by lisuying214

Some questions about VLM quant

#40 opened Oct 23, 2024 by hanhanpp

Question about pagedattention

#36 opened Sep 6, 2024 by SherrySwift

How to add new models?

#33 opened Aug 23, 2024 by NicolasDrapier

RMSNorm implemented as LayerNorm

#32 opened Aug 21, 2024 by jason-huang03

LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe.

#29 opened Aug 12, 2024 by Patrick-Lew

[New Feature] Will MLA Be Supported?

#28 opened Aug 8, 2024 by RanchiZhao

How can we reproduce Table.2 and 3 ? (PPL and zero-shot Acc)

#25 opened Jul 12, 2024 by kriskrisliu

Question about dequantization overhead

#23 opened Jul 6, 2024 by DD-DuDa

Circular import error

#22 opened Jul 5, 2024 by LuckyLYM

The outpout of given model(mit-han-lab/Llama-3-8B-QServe-g128) is mistaken

#21 opened Jul 2, 2024 by haichuan1221

Expected speed for llama3-70b-instruct

#18 opened Jun 4, 2024 by ethxnp

Is the Table.3 accuracy tested with dequantized weights, or tested on real accelerated quantized kernels?

#17 opened Jun 3, 2024 by vovoluck

has anyone tried to HIPify this for AMD/ROCm

#16 opened Jun 2, 2024 by ehartford

support tp

#14 opened May 24, 2024 by cyLi-Tiger

activation quantization

#13 opened May 24, 2024 by hanhanpp

Any performance comparsion with vllm?

#12 opened May 21, 2024 by MuYu-zhi

Llama-2-7B-QServe model doesn't give the expected output

#11 opened May 21, 2024 by MuYu-zhi

Previous 1 2 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly