version 4.47.0 provides different generation results when using quantized awq model #35286

xin3he · 2024-12-16T02:41:27Z

System Info

transformers version: 4.47.0
Platform: Linux-5.4.0-169-generic-x86_64-with-glibc2.31
Python version: 3.9.19
Huggingface_hub version: 0.26.5
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA A100-SXM4-80GB

Who can help?

@gante @SunMarc @MekkCyber

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer

autoawq_model = "casperhansen/opt-125m-awq"
prompt = "One day, the little girl"
user_model = AutoModelForCausalLM.from_pretrained(autoawq_model).to('cuda:0')
tokenizer = AutoTokenizer.from_pretrained(autoawq_model)
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to('cuda:0')
generate_kwargs = dict(do_sample=False, temperature=0.9, num_beams=4)
gen_ids = user_model.generate(input_ids, **generate_kwargs)
gen_text = tokenizer.batch_decode(gen_ids, skip_special_tokens=True)
target_text = ["One day, the little girl in the back of my mind will ask me if I'm a"]
assert gen_text == target_text, f"Expect: {target_text}\n but get: {gen_text}."

Expected behavior

When version < 4.47.0, it works well. Version 4.47.0 provides different result

Traceback (most recent call last):
  File "/data6/xinhe/fx_test/test.py", line 13, in <module>
    assert gen_text == target_text, f"Expect: {target_text}\n but get: {gen_text}."
AssertionError: Expect: ["One day, the little girl in the back of my mind will ask me if I'm a"]
 but get: ['One day, the little girl in the back of my mind will say, ??I??m so glad you??'].

The text was updated successfully, but these errors were encountered:

xin3he added the bug label Dec 16, 2024

xin3he mentioned this issue Dec 16, 2024

workaround transformers issue in version 4.47.0 intel/neural-compressor#2092

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

version 4.47.0 provides different generation results when using quantized awq model #35286

version 4.47.0 provides different generation results when using quantized awq model #35286

xin3he commented Dec 16, 2024 •

edited

Loading

version 4.47.0 provides different generation results when using quantized awq model #35286

version 4.47.0 provides different generation results when using quantized awq model #35286

Comments

xin3he commented Dec 16, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

xin3he commented Dec 16, 2024 •

edited

Loading