[Bug]: assert parts[0] == "base_model" AssertionError #4682

Edisonwei54 · 2024-05-08T11:54:09Z

Your current environment

The output of `python collect_env.py`
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (conda-forge gcc 13.2.0-7) 13.2.0
Clang version: Could not collect
CMake version: version 3.29.2
Libc version: glibc-2.35

Python version: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-27-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX A6000
Nvidia driver version: 535.171.04
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      43 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             96
On-line CPU(s) list:                0-95
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 7F72 24-Core Processor
CPU family:                         23
Model:                              49
Thread(s) per core:                 2
Core(s) per socket:                 24
Socket(s):                          2
Stepping:                           0
Frequency boost:                    enabled
CPU max MHz:                        3200.0000
CPU min MHz:                        2500.0000
BogoMIPS:                           6400.17
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization:                     AMD-V
L1d cache:                          1.5 MiB (48 instances)
L1i cache:                          1.5 MiB (48 instances)
L2 cache:                           24 MiB (48 instances)
L3 cache:                           384 MiB (24 instances)
NUMA node(s):                       2
NUMA node0 CPU(s):                  0-23,48-71
NUMA node1 CPU(s):                  24-47,72-95
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.0
[pip3] triton==2.3.0
[pip3] vllm_nccl_cu12==2.18.1.0.4.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] triton                    2.3.0                    pypi_0    pypi
[conda] vllm-nccl-cu12            2.18.1.0.4.0             pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.2
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0-23,48-71      0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

python -m vllm.entrypoints.openai.api_server
--model /mnt/sda/edison/llama3/Meta-Llama-3-8B-Instruct
--enable-lora
--lora-modules test-lora=/mnt/sda/edison/llama3/checkpoint-441
--gpu-memory-utilization 0.9
--host 0.0.0.0
--port 8008
--tensor-parallel-size 1
--enforce-eager

The text was updated successfully, but these errors were encountered:

Edisonwei54 · 2024-05-08T11:54:51Z

INFO 05-08 11:38:43 async_llm_engine.py:529] Received request cmpl-cba3bae644234c86b209b60b0e93273b-0: prompt: '你好', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [57668, 53901], lora_request: LoRARequest(lora_name='test-lora', lora_int_id=1, lora_local_path='/mnt/sda/edison/llama3/checkpoint-441').
INFO 05-08 11:38:43 async_llm_engine.py:154] Aborted request cmpl-cba3bae644234c86b209b60b0e93273b-0.
INFO: 192.168.31.138:55550 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/worker_manager.py", line 150, in _load_lora
lora = self._lora_model_cls.from_local_checkpoint(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/models.py", line 246, in from_local_checkpoint
return cls.from_lora_tensors(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/models.py", line 150, in from_lora_tensors
module_name, is_lora_a = parse_fine_tuned_lora_name(tensor_name)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/utils.py", line 89, in parse_fine_tuned_lora_name
assert parts[0] == "base_model"
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 99, in create_chat_completion
generator = await openai_serving_chat.create_chat_completion(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 138, in create_chat_completion
return await self.chat_completion_full_generator(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 301, in chat_completion_full_generator
async for res in result_generator:
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 666, in generate
raise e
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 660, in generate
async for request_output in stream:
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 77, in anext
raise result
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 501, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/opt/conda/envs/vllm/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 475, in engine_step
request_outputs = await self.engine.step_async()
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 221, in step_async
output = await self.model_executor.execute_model_async(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 148, in execute_model_async
output = await make_async(self.driver_worker.execute_model
File "/opt/conda/envs/vllm/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker.py", line 249, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 790, in execute_model
self.set_active_loras(lora_requests, lora_mapping)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 901, in set_active_loras
self.lora_manager.set_active_loras(lora_requests, lora_mapping)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/worker_manager.py", line 113, in set_active_loras
self._apply_loras(lora_requests)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/worker_manager.py", line 235, in _apply_loras
self.add_lora(lora)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/worker_manager.py", line 243, in add_lora
lora = self._load_lora(lora_request)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/lora/worker_manager.py", line 162, in _load_lora
raise RuntimeError(
RuntimeError: Loading lora /mnt/sda/edison/llama3/checkpoint-441 failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 114, in create_completion
generator = await openai_serving_completion.create_completion(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_completion.py", line 154, in create_completion
async for i, res in result_generator:
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/utils.py", line 240, in consumer
raise e
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/utils.py", line 233, in consumer
raise item
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/utils.py", line 217, in producer
async for item in iterator:
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 666, in generate
raise e
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 650, in generate
stream = await self.add_request(
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 537, in add_request
self.start_background_loop()
File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 411, in start_background_loop
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.

Edisonwei54 · 2024-05-08T11:56:06Z

When I use
···
Curl http://0.0.0.0:8008/v1/completions
-H "Content Type: application/JSON"
-D '{
"Model": "test lora",
"Prompt": "Hello",
"Max_tokens": 128,
"Temperature": 0.7
}| jq
···

Edisonwei54 · 2024-05-08T11:57:10Z

@WoosukKwon
@zhuohan123

DarkLight1337 · 2024-05-08T12:53:55Z

You need to use the --served-model-name argument to set the name of your model. Otherwise, you can only refer to it via the value passed to --model (in your example, it is /mnt/sda/edison/llama3/Meta-Llama-3-8B-Instruct)

kubernetes-bad · 2024-05-08T21:35:22Z

This has nothing to do with model name, FYI.
vLLM expects LoRA adapter tensors to have base_model.model.lm_head.lora_A.weight and base_model.model.lm_head.lora_B.weight, while some adapters just have base_model.model.lm_head.weight and fail said assert (vllm/lora/utils.py#L89).

Edisonwei54 added the bug Something isn't working label May 8, 2024

DarkLight1337 mentioned this issue Jun 2, 2024

[Misc] Improve error message when LoRA parsing fails #5194

Merged

DarkLight1337 closed this as completed in #5194 Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: assert parts[0] == "base_model" AssertionError #4682

[Bug]: assert parts[0] == "base_model" AssertionError #4682

Edisonwei54 commented May 8, 2024

Edisonwei54 commented May 8, 2024

Edisonwei54 commented May 8, 2024

Edisonwei54 commented May 8, 2024

DarkLight1337 commented May 8, 2024

kubernetes-bad commented May 8, 2024

[Bug]: assert parts[0] == "base_model" AssertionError #4682

[Bug]: assert parts[0] == "base_model" AssertionError #4682

Comments

Edisonwei54 commented May 8, 2024

Your current environment

🐛 Describe the bug

Edisonwei54 commented May 8, 2024

Edisonwei54 commented May 8, 2024

Edisonwei54 commented May 8, 2024

DarkLight1337 commented May 8, 2024

kubernetes-bad commented May 8, 2024