-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] vllm updated its get_model function #1183
Comments
+1 |
+1
|
Hello, we've noticed this issue. You can use vllm==0.5.4 right now. And we will fix the bug ASAP.
|
fixed with #1155 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Checklist
Describe the bug
Previously we used this in our
ModelRunner
:Note that vllm updated its get_model function several days ago and removed the
multimodal_config
parameter.https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/model_loader/__init__.py
Also, even if we delete this parameter from our ModelRunner:
The unit test in
test/srt/models/test_embedding_models.py
still could not pass. Since I found that the new get_model function will load the LlamaEmbeddingModel class defined in vllm/model_executor/models/llama_embedding.pyBut we want the get_model function to load our LlamaEmbeddingModel class in https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama_embedding.py
Could someone check what's happening in the new get_model function?
Reproduction
Simply use the latest code of vllm and sglang can reproduce it.
The attachment is my traceback:
The error "TypeError: LlamaEmbeddingModel.forward() missing 1 required positional argument: 'attn_metadata'" told me that the get_model function is loading the LlamaEmbeddingModel in vllm but not sglang.
Environment
Python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA RTX A6000
GPU 0,1,2,3,4,5,6,7 Compute Capability: 8.6
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.3, V12.3.103
CUDA Driver Version: 545.23.08
PyTorch: 2.4.0+cu121
sglang: 0.2.13
flashinfer: 0.1.5+cu121torch2.4
triton: 3.0.0
transformers: 4.43.3
requests: 2.32.3
tqdm: 4.66.4
numpy: 1.26.4
aiohttp: 3.9.5
fastapi: 0.112.1
hf_transfer: 0.1.8
huggingface_hub: 0.24.3
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.8.2
uvicorn: 0.23.2
uvloop: 0.19.0
zmq: 26.0.3
vllm: 0.5.4
multipart: 0.0.9
openai: 1.40.3
anthropic: 0.33.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS SYS SYS SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU1 SYS X SYS SYS SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU2 SYS SYS X SYS SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU3 SYS SYS SYS X SYS SYS SYS SYS 0-15,32-47 0 N/A
GPU4 SYS SYS SYS SYS X SYS SYS SYS 16-31,48-63 1 N/A
GPU5 SYS SYS SYS SYS SYS X SYS SYS 16-31,48-63 1 N/A
GPU6 SYS SYS SYS SYS SYS SYS X SYS 16-31,48-63 1 N/A
GPU7 SYS SYS SYS SYS SYS SYS SYS X 16-31,48-63 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
ulimit soft: 1048576
The text was updated successfully, but these errors were encountered: