Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoftQConfig + LoraConfig] throws size matmul mismatch error #1240

Closed
3 of 4 tasks
SoundProvider opened this issue Dec 8, 2023 · 8 comments
Closed
3 of 4 tasks

[LoftQConfig + LoraConfig] throws size matmul mismatch error #1240

SoundProvider opened this issue Dec 8, 2023 · 8 comments

Comments

@SoundProvider
Copy link

SoundProvider commented Dec 8, 2023

System Info

  • docker image: pytorch/pytorch:2.1.1-cuda12.1-cudnn8-devel
  • pip list
Package                 Version
----------------------- ------------
accelerate              0.25.0
aiohttp                 3.9.1
aiosignal               1.3.1
asttokens               2.0.5
astunparse              1.6.3
async-timeout           4.0.3
attrs                   23.1.0
backcall                0.2.0
beautifulsoup4          4.12.2
blessed                 1.20.0
boltons                 23.0.0
Brotli                  1.0.9
certifi                 2023.7.22
cffi                    1.15.1
chardet                 4.0.0
charset-normalizer      2.0.4
click                   8.1.7
comm                    0.2.0
conda                   23.9.0
conda-build             3.27.0
conda-content-trust     0.2.0
conda_index             0.3.0
conda-libmamba-solver   23.7.0
conda-package-handling  2.2.0
conda_package_streaming 0.9.0
cryptography            41.0.3
datasets                2.15.0
debugpy                 1.8.0
decorator               5.1.1
dill                    0.3.7
dnspython               2.4.2
evaluate                0.4.1
exceptiongroup          1.0.4
executing               0.8.3
expecttest              0.1.6
filelock                3.9.0
frozenlist              1.4.0
fsspec                  2023.10.0
gmpy2                   2.1.2
gpustat                 1.1.1
huggingface-hub         0.19.4
hypothesis              6.88.4
idna                    3.4
ipykernel               6.27.1
ipython                 8.15.0
jedi                    0.18.1
Jinja2                  3.1.2
joblib                  1.3.2
jsonpatch               1.32
jsonpointer             2.1
jupyter_client          8.6.0
jupyter_core            5.5.0
libarchive-c            2.9
libmambapy              1.5.1
MarkupSafe              2.1.1
matplotlib-inline       0.1.6
mkl-fft                 1.3.8
mkl-random              1.2.4
mkl-service             2.4.0
more-itertools          8.12.0
mpmath                  1.3.0
multidict               6.0.4
multiprocess            0.70.15
nest-asyncio            1.5.8
networkx                3.1
numpy                   1.26.0
nvidia-ml-py            12.535.133
packaging               23.1
pandas                  2.1.3
parso                   0.8.3
peft                    0.7.0
pexpect                 4.8.0
pickleshare             0.7.5
Pillow                  10.0.1
pip                     23.3
pkginfo                 1.9.6
platformdirs            4.1.0
pluggy                  1.0.0
prompt-toolkit          3.0.36
protobuf                4.25.1
psutil                  5.9.0
ptyprocess              0.7.0
pure-eval               0.2.2
pyarrow                 14.0.1
pyarrow-hotfix          0.6
pycosat                 0.6.6
pycparser               2.21
Pygments                2.15.1
pynvml                  11.5.0
pyOpenSSL               23.2.0
PySocks                 1.7.1
python-dateutil         2.8.2
python-etcd             0.4.5
pytz                    2023.3.post1
PyYAML                  6.0.1
pyzmq                   25.1.2
regex                   2023.10.3
requests                2.31.0
responses               0.18.0
ruamel.yaml             0.17.21
ruamel.yaml.clib        0.2.6
safetensors             0.4.1
scikit-learn            1.3.2
scipy                   1.11.4
sentencepiece           0.1.99
setuptools              68.0.0
six                     1.16.0
sortedcontainers        2.4.0
soupsieve               2.5
stack-data              0.2.0
sympy                   1.11.1
threadpoolctl           3.2.0
tokenizers              0.15.0
tomli                   2.0.1
toolz                   0.12.0
torch                   2.1.1
torchaudio              2.1.1
torchelastic            0.2.2
torchvision             0.16.1
tornado                 6.4
tqdm                    4.65.0
traitlets               5.7.1
transformers            4.35.2
triton                  2.1.0
truststore              0.8.0
types-dataclasses       0.6.6
typing_extensions       4.7.1
tzdata                  2023.3
urllib3                 1.26.18
wcwidth                 0.2.5
wheel                   0.41.2
xxhash                  3.4.1
yarl                    1.9.4
zstandard               0.19.0

Who can help?

@pac

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I'm testing PEFT Lora Initialization options.

from peft import LoftQConfig, LoraConfig, get_peft_model

base_model = AutoModelForCausalLM.from_pretrained(...)  # don't quantize here
loftq_config = LoftQConfig(loftq_bits=4, ...)           # set 4bit quantization
lora_config = LoraConfig(..., init_lora_weights="loftq", loftq_config=loftq_config)
peft_model = get_peft_model(base_model, lora_config)

The script I'm testing is official huggingface run_clm.py script and I pasted the lora config part. Nothing else is added or deleted from the original file.

embedding_size = model.get_input_embeddings().weight.shape[0]
if len(tokenizer) > embedding_size:
    model.resize_token_embeddings(len(tokenizer))

##################################################
if model_args.is_lora:
    print("[*] loading lora config")
    loftq_config = LoftQConfig(loftq_bits=8)     
    lora_config = LoraConfig(
        task_type = TaskType.CAUSAL_LM, # TaskType: CAUSAL_LM, SEQ_CLS,,,
        inference_mode = False,
        # r = 4, 
        # lora_alpha = 8, 
        r = 8,
        lora_alpha = 16,
        lora_dropout = 0.1,
        target_modules = [ # for EXAONE v2.0
            "c_attn",
            "c_proj",
            "c_fc"
            # "out_proj",
            # "c_fc_0",
            # "c_fc_1",
            # "c_proj",
        ],
        init_lora_weights="loftq",
        loftq_config=loftq_config
    )
    
    # lora_config = AdaLoraConfig(
    #     peft_type="ADALORA",
    #     task_type="CAUSAL_LM",
    #     r=8,
    #     lora_alpha=16,
    #     target_modules=[
    #         "c_attn", 
    #         "c_proj", 
    #         "c_fc"
    #     ],
    #     lora_dropout=0.1,
    #     )
    
    print("[*] lora_config")
    print(lora_config)
    
    model = get_peft_model(model, lora_config)
    model.print_trainable_parameters()
##################################################

# Preprocessing the datasets.
# First we tokenize all the texts.
if training_args.do_train:
    column_names = list(raw_datasets["train"].features)
else:
    column_names = list(raw_datasets["validation"].features)
text_column_name = "text" if "text" in column_names else column_names[0]

Here is my running script.

python -u run_clm_lora.py \
    --model_name_or_path gpt2 \
    --dataset_name wikitext   \
    --dataset_config_name wikitext-2-raw-v1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --output_dir /tmp/test-clm \
    --num_train_epochs 5 \
    --overwrite_output_dir \
    --trust_remote_code False \
    --is_lora True

Expected behavior

I expected LoftQ Initialization

@BenjaminBossan
Copy link
Member

Thanks for reporting. Do you get an error like this?

quantized_weight, max_abs, shape = quantizer.quantize_block(res)

UnboundLocalError: local variable 'quantizer' referenced before assignment

This is because of the bug mentioned here:

#1150 (comment)

If I change this line:

if not is_bnb_4bit_available():

to:

if not is_bnb_4bit_available() or num_bits == 8:

I get some progress with your example, but unfortunately encounter another issue:

return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x768 and 2304x8)
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
~/anaconda3/envs/peft/lib/python3.10/site-packages/torch/nn/modules/linear.py(114)forward()
-> return F.linear(input, self.weight, self.bias)

Similar thing happens when I try to use 4bit instead of 8bit (remember to send the model to cuda).

Interestingly, it works for me when using a different architecture (bloomz-560m), both with 4bit and 8bit (when applying the fix above). Therefore, I suspect it's somehow related to the model architecture (we had some issues with gpt2 in the past).

Ping @yxli2123

@yxli2123
Copy link
Contributor

yxli2123 commented Dec 8, 2023

Hi, gpt2 uses nn.Conv1D() instead of nn.Linear(). I'm not sure if this is the reason. We haven't test it on GPT2 yet.

@SoundProvider
Copy link
Author

Hi, gpt2 uses nn.Conv1D() instead of nn.Linear(). I'm not sure if this is the reason. We haven't test it on GPT2 yet.

I guess this is the reason. I tested on bloomz-560m as @BenjaminBossan mentioned and It worked just fine!

@BenjaminBossan
Copy link
Member

I think the question has been answered, if something new comes up, feel free to re-open.

@adampauls
Copy link

adampauls commented Jan 9, 2024

I get a similar error on meta-llama/Llama-2-7b-chat-hf. My LoraConfig is

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=None, inference_mode=False, r=8, target_modules=['o_proj', 'up_proj', 'gate_proj', 'k_proj', 'down_proj', 'q_proj', 'v_proj'], lora_alpha=8, lora_dropout=0.0, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights='loftq', layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={'loftq_bits': 4, 'loftq_iter': 1})

and my quantization config is

BitsAndBytesConfig {
  "bnb_4bit_compute_dtype": "bfloat16",
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_use_double_quant": true,
  "llm_int8_enable_fp32_cpu_offload": false,
  "llm_int8_has_fp16_weight": false,
  "llm_int8_skip_modules": null,
  "llm_int8_threshold": 6.0,
  "load_in_4bit": true,
  "load_in_8bit": false,
  "quant_method": "bitsandbytes"
}

The error is File "/home/nonroot/precog/.venv/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: mat1 and mat2 shapes cannot be multiplied (24576x4096 and 1x1)
Has it been tested with Llama 2? I would have assumed so. I'm on peft 0.7.1 and transformers 4.36.2.

@adampauls
Copy link

It's also happening with 01-ai/Yi-6B-Chat, so perhaps this is something I'm doing wrong. Any ideas what it could be?

@adampauls
Copy link

I see the issue. I was quantization the model on load with AutoModelForCausalLM.from_pretrained(quantization_config=quantization_config), but loftq has to take the unquantized model and quantize it. It might be worth adding a warning to make this clear to the user.

@BenjaminBossan
Copy link
Member

It might be worth adding a warning to make this clear to the user.

We have documented this here, here, and here. Is there anywhere else you looked where this info could be added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants