Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 运行OpenGVLab/InternVL2-2B-AWQ报错:KeyError: 'language_model.model.layers.0.feed_forward.w1.weight' #2168

Closed
3 tasks done
jianliao opened this issue Jul 29, 2024 · 7 comments

Comments

@jianliao
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

> lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ

KeyError: 'language_model.model.layers.0.feed_forward.w1.weight'

如果切换Backend,能够运行但是会输出大量的log,详见此附件bug.log

Reproduction

> lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ

or

lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ --backend pytorch

Environment

OS: Ubuntu 22.04
Python: 3.12
Model OpenGVLab/InternVL2-2B-AWQ

Error traceback

Traceback (most recent call last):
  File "/home/jianliao/anaconda3/envs/lmdeploy/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/cli/entrypoint.py", line 36, in run
    args.run(args)
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/cli/serve.py", line 298, in api_server
    run_api_server(args.model_path,
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/serve/openai/api_server.py", line 1285, in serve
    VariableInterface.async_engine = pipeline_class(
                                     ^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/serve/vl_async_engine.py", line 24, in __init__
    super().__init__(model_path, **kwargs)
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/serve/async_engine.py", line 190, in __init__
    self._build_turbomind(model_path=model_path,
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind
    self.engine = tm.TurboMind.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/turbomind.py", line 144, in __init__
    self.model_comm = self._from_hf(model_source=model_source,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/turbomind.py", line 235, in _from_hf
    output_model = OUTPUT_MODELS.get(output_model_name)(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/deploy/target_model/fp.py", line 26, in __init__
    super().__init__(input_model, cfg, to_file, out_dir)
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 172, in 
[bug.log](https://github.com/user-attachments/files/16405853/bug.log)
__init__
    self.cfg = self.get_config(cfg)
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/deploy/target_model/fp.py", line 38, in get_config
    w1, _, _ = bin.ffn(i)
               ^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/deploy/source_model/internlm2.py", line 69, in ffn
    return self._ffn(i, 'weight')
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages/lmdeploy/turbomind/deploy/source_model/internlm2.py", line 62, in _ffn
    tensor = self.params[
             ^^^^^^^^^^^^
KeyError: 'language_model.model.layers.0.feed_forward.w1.weight'
@lvhan028
Copy link
Collaborator

The related PR #1984 #1913 haven't been merged yet.

@AllentDan
Copy link
Collaborator

What is the version of lmdepoy? @jianliao The latest lmdeploy can run the model with the default backend turbomind.

@jianliao
Copy link
Author

@AllentDan I upgraded to the latest version (0.5.2.post1), but I am still encountering the same error with the following command:
lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ.
Here are the details of my lmdeploy version:

(lmdeploy) jianliao@jianliao-ubuntu:~$ pip show lmdeploy
Name: lmdeploy
Version: 0.5.2.post1
Summary: A toolset for compressing, deploying and serving LLM
Home-page: 
Author: OpenMMLab
Author-email: openmmlab@gmail.com
License: 
Location: /home/jianliao/anaconda3/envs/lmdeploy/lib/python3.12/site-packages
Requires: accelerate, einops, fastapi, fire, mmengine-lite, numpy, nvidia-cublas-cu12, nvidia-cuda-runtime-cu12, nvidia-curand-cu12, nvidia-nccl-cu12, peft, pillow, protobuf, pydantic, pynvml, safetensors, sentencepiece, shortuuid, tiktoken, torch, torchvision, transformers, triton, uvicorn
Required-by: 

@AllentDan
Copy link
Collaborator

Can you try adding --model-format awq?

@jianliao
Copy link
Author

jianliao commented Aug 3, 2024

@AllentDan @lvhan028 The issue has been resolved after applying the --model-format awq option. Thanks Bro.

@jianliao jianliao closed this as completed Aug 3, 2024
@lzk9508
Copy link

lzk9508 commented Dec 16, 2024

Can you try adding --model-format awq?

self.pipe = pipeline(model_path, backend_config=TurbomindEngineConfig(
session_len=self.session_len, cache_max_entry_count=self.cache_max_entry_count))

    self.pipe.vl_encoder.model.config.max_dynamic_patch = self.max_dynamic_patch

I run the awq quantified model and meet the same promblem, my script to start inference like this, how to modify it ?

@AllentDan
Copy link
Collaborator

  1. Update to latest lmdeploy, it was resolved.
  2. set model_format argument to 'awq' in TurbomindEngineConfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants