GPTQ support? #26

tigerinus · 2023-12-07T12:51:46Z

Very new to this library...just need a quick answer if GPTQ is going to be supported.

Need to inference LLaMA2-13b-GPTQ on RTX 4060 ti

mfuntowicz · 2023-12-07T19:06:49Z

Thanks @tigerinus for bringing up this feature request.

I think we can enable loading and building engines from TheBloke GPTQ models pretty easily, would that work for you?

tigerinus · 2023-12-07T23:13:35Z

Thanks @tigerinus for bringing up this feature request.

I think we can enable loading and building engines from TheBloke GPTQ models pretty easily, would that work for you?

That would be perfect. How soon can it happen?

heurainbow · 2023-12-08T02:37:15Z

maybe support more quantization method, like AWQ

dimaischenko · 2023-12-11T07:39:54Z

I think we can enable loading and building engines from TheBloke GPTQ models pretty easily, would that work for you?

@tigerinus @mfuntowicz TheBloke's GPTQ support will be awesome!

mfuntowicz · 2023-12-18T16:41:23Z

Thanks for your comments! Tentatively targeting AWQ/GPTQ from TheBloke on the 🤗 Hub in the next iteration (i.e. 0.1.0b3) which should happens around next week.

Stay tuned!

dimaischenko · 2023-12-18T16:56:21Z

Thanks for your comments! Tentatively targeting AWQ/GPTQ from TheBloke on the 🤗 Hub in the next iteration (i.e. 0.1.0b3) which should happens around next week.

Stay tuned!

@mfuntowicz it will be awesome! ❤️

dimaischenko · 2024-01-08T15:17:41Z

Hey-hey @mfuntowicz is GPTQ still in the plans?

Anindyadeep · 2024-01-13T15:16:35Z

Hello @dimaischenko, I was checking the examples, and if you see this part:

    if args.has_quantization_step:
        from optimum.nvidia.quantization import get_default_calibration_dataset

        max_length = min(args.max_prompt_length + args.max_new_tokens, tokenizer.model_max_length)
        calib = get_default_calibration_dataset(args.num_calibration_samples)

        if hasattr(calib, "tokenize"):
            calib.tokenize(tokenizer, max_length=max_length, pad_to_multiple_of=8)

        # Add the quantization step
        builder.with_quantization_profile(args.quantization_config, calib)

https://github.com/huggingface/optimum-nvidia/blame/c5301b3a0debe4a852e1a11e460e76b638f59312/examples/text-generation/llama.py#L88-L101

Then it likely supports quantization. So what you can try to do is, once the docker installation is done, you can run the examples/text-generation/llama.py file where the model folder should contain the autogpq file and hopefully that can work. And while running you might be required to put the flag: --has_quantization_step to go for supporting that file.

dimaischenko · 2024-01-22T09:44:46Z

@Anindyadeep Thank you! I need a little time to figure it out. I'm using the lowest level generation of AutoGPTQForCausalLM right now. Something like:

model = AutoGPTQForCausalLM.from_quantized(model_name, ...)

...

ids = model.prepare_inputs_for_generation(
    batch_input_ids,
    past_key_values=past_key_values,
    attention_mask=attention_mask,
    use_cache=True,
    **model_kwargs)

out = model(**ids)

Need to find out if your option is right for me

Anindyadeep · 2024-01-22T09:47:17Z

@Anindyadeep Thank you! I need a little time to figure it out. I'm using the lowest level generation of AutoGPTQForCausalLM right now. Something like:
model = AutoGPTQForCausalLM.from_quantized(model_name, ...)

...

ids = model.prepare_inputs_for_generation(
    batch_input_ids,
    past_key_values=past_key_values,
    attention_mask=attention_mask,
    use_cache=True,
    **model_kwargs)

out = model(**ids)
Need to find out if your option is right for me

Awesome, and let me also know if it does work or not.

tigerinus · 2024-05-14T09:40:41Z

I see this is not closed yet - Is GPTQ still not supported yet?

mfuntowicz added the enhancement New feature or request label Dec 7, 2023

mfuntowicz self-assigned this Dec 7, 2023

mfuntowicz added this to the 0.1.0b3 milestone Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ support? #26

GPTQ support? #26

tigerinus commented Dec 7, 2023

mfuntowicz commented Dec 7, 2023

tigerinus commented Dec 7, 2023

heurainbow commented Dec 8, 2023

dimaischenko commented Dec 11, 2023 •

edited

Loading

mfuntowicz commented Dec 18, 2023 •

edited

Loading

dimaischenko commented Dec 18, 2023

dimaischenko commented Jan 8, 2024

Anindyadeep commented Jan 13, 2024 •

edited

Loading

dimaischenko commented Jan 22, 2024

Anindyadeep commented Jan 22, 2024

tigerinus commented May 14, 2024

GPTQ support? #26

GPTQ support? #26

Comments

tigerinus commented Dec 7, 2023

mfuntowicz commented Dec 7, 2023

tigerinus commented Dec 7, 2023

heurainbow commented Dec 8, 2023

dimaischenko commented Dec 11, 2023 • edited Loading

mfuntowicz commented Dec 18, 2023 • edited Loading

dimaischenko commented Dec 18, 2023

dimaischenko commented Jan 8, 2024

Anindyadeep commented Jan 13, 2024 • edited Loading

dimaischenko commented Jan 22, 2024

Anindyadeep commented Jan 22, 2024

tigerinus commented May 14, 2024

dimaischenko commented Dec 11, 2023 •

edited

Loading

mfuntowicz commented Dec 18, 2023 •

edited

Loading

Anindyadeep commented Jan 13, 2024 •

edited

Loading