vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.2k
Star 34.1k

Code
Issues 1.2k
Pull requests 459
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q1 2025

#11862 opened Jan 8, 2025 by simon-mo

Open

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 10

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,164 Open 4,849 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: The output of Aria model is not correct bug

Something isn't working

#12241 opened Jan 21, 2025 by xffxff

1 task done

[Usage]: how to input messages as multi-message (a batch) instead of just one usage

How to use vllm

#12234 opened Jan 21, 2025 by Hyfred

1 task done

[Bug]: RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered bug

Something isn't working

#12233 opened Jan 21, 2025 by Quang-elec44

1 task done

[New Model]: Add support for DeepSeek R1 new model

Requests to new models

#12226 opened Jan 20, 2025 by jorgeantonio21

1 task done

[Usage]: Guided choice not working as expected usage

How to use vllm

#12225 opened Jan 20, 2025 by srsingh24

1 task done

[Usage]: Context window crashes web window when full usage

How to use vllm

#12221 opened Jan 20, 2025 by seabastard

1 task done

[Feature]: SwiftKV cache compression feature request

#12220 opened Jan 20, 2025 by arunpatala

1 task done

[Bug]: ValueError: Model architectures ['LlamaForCausalLM'] failed to be inspected. Please check the logs for more details. bug

Something isn't working

#12219 opened Jan 20, 2025 by walker-ai

1 task done

[Usage]: BNB quantization not supported for Paligemma2 model usage

How to use vllm

#12216 opened Jan 20, 2025 by ken2190

1 task done

[Usage]: how to generate results and get the embeddings of the result usage

How to use vllm

#12213 opened Jan 20, 2025 by daiwk

1 task done

[Usage]: what is the most efficient way to do with a 72b model and 8 * A100 ? usage

How to use vllm

#12205 opened Jan 20, 2025 by Chandler-Bing

1 task done

[Usage]: Does vLLM support deploying the speculative model on a second device? usage

How to use vllm

#12200 opened Jan 20, 2025 by CharlesRiggins

1 task done

[Bug]: Inconsistent data received and sent using PyNcclPipe bug

Something isn't working

#12197 opened Jan 20, 2025 by fanfanaaaa

1 task done

[Bug]: CUDA initialization error with vLLM 0.5.4 and PyTorch 2.4.0+cu121 bug

Something isn't working

#12189 opened Jan 19, 2025 by TaoShuchang

1 task done

[Bug]: Fail to use beamsearch with llm.chat bug

Something isn't working

#12183 opened Jan 18, 2025 by gystar

1 task done

[Bug]: Multi-Node Online Inference on TPUs Failing bug

Something isn't working

#12179 opened Jan 17, 2025 by BabyChouSr

1 task done

[Bug]: AMD GPU docker image build No matching distribution found for torch==2.6.0.dev20241113+rocm6.2 bug

Something isn't working

rocm

#12178 opened Jan 17, 2025 by samos123

1 task done

[Bug]: Slow huggingface weights download. Sequential download bug

Something isn't working

#12177 opened Jan 17, 2025 by NikolaBorisov

1 task done

[RFC]: Distribute LoRA adapters across deployment RFC

#12174 opened Jan 17, 2025 by joerunde

1 task done

[Feature]: Serve /metrics while a model is loading feature request

#12173 opened Jan 17, 2025 by xfalcox

1 task done

[Bug]: Issue running the Granite-7b GGUF quantized model on multiple GPUs with vLLM due to a tensor size mismatch. bug

Something isn't working

#12170 opened Jan 17, 2025 by tarukumar

1 task done

[New Model]: openbmb/MiniCPM-o-2_6 new model

Requests to new models

#12162 opened Jan 17, 2025 by myoss

1 task done

[Usage]: Terminates without any error 30 seconds after a successful run. usage

How to use vllm

#12160 opened Jan 17, 2025 by hznnnnnn

1 task done

[Feature]: Any plan to support key features of nanoflow? feature request

#12157 opened Jan 17, 2025 by dwq370

1 task done

[Bug]: After updating VLLM from 0.4.0.post1 to 0.6.4, the model loading time increased by one minute. bug

Something isn't working

#12155 opened Jan 17, 2025 by 123qwe-ux

1 task done

Previous 1 2 3 4 5 … 46 47 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly