[Usage]: how to use openai compatible api to run GGUF model? #8401

weiminw · 2024-09-12T06:26:12Z

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

bhavnicksm · 2024-09-12T10:51:18Z

Hey @weiminw,
If you pass the GGUF Quantized model path directly to the OpenAI compatible API, it should be able to detect the quantization to use with it and run.

Let me know if you are seeing something else.

IgorBeHolder · 2024-09-15T15:45:01Z

Hey @weiminw, If you pass the GGUF Quantized model path directly to the OpenAI compatible API, it should be able to detect the quantization to use with it and run.

Let me know if you are seeing something else.

Hi,
I tried but

rag-vllm-1  |     raise EnvironmentError(
rag-vllm-1  | OSError: mradermacher/Nemo-12B-Marlin-v7-GGUF does not appear to have a file named config.json. Checkout 'https://huggingface.co/mradermacher/Nemo-12B-Marlin-v7-GGUF/tree/main' for available files.

GGUF format has all metadata, isnt it?
Why did vllm ask for config.json?

paolovic · 2024-09-16T06:13:10Z

The same holds for me as also described in here
#4416

When trying to load a GGUF model, e.g., https://huggingface.co/bartowski/reader-lm-1.5b-GGUF , vLLM requires a config.json although the new (?) GGUF quantized models come only with an .imatrix

OSError: /reader-lm-1.5b-GGUF/ does not appear to have a file named config.json. Checkout 'https://huggingface.co//u01/app/mlo/models/reader-lm-1.5b-GGUF//tree/Nles.

bhavnicksm · 2024-09-17T06:49:56Z

Hey @paolovic,

Yes, this error occurs because vLLM is currently not looking for .gguf files inside the folder but instead assumes you pass the model as the .gguf weight path. Additionally, you would also need to pass the tokenizer as the path to the tokenizer.

I have tested this for v0.6.1.post2 and it works properly with the .gguf weights path.

llm =  LLM(model = "/path/to/model.gguf", tokenizer="/path/to/tokenizer")

cc: @IgorBeHolder @weiminw

paolovic · 2024-09-17T12:04:25Z

Hey @paolovic,

Yes, this error occurs because vLLM is currently not looking for .gguf files inside the folder but instead assumes you pass the model as the .gguf weight path. Additionally, you would also need to pass the tokenizer as the path to the tokenizer.

I have tested this for v0.6.1.post2 and it works properly with the .gguf weights path.
llm =  LLM(model = "/path/to/model.gguf", tokenizer="/path/to/tokenizer")
cc: @IgorBeHolder @weiminw

ahhhh....ok, easy
thank you very much, somehow I overlooked this information! @bhavnicksm

weiminw added the usage How to use vllm label Sep 12, 2024

Isotr0py mentioned this issue Sep 19, 2024

[Doc] Add documentation for GGUF quantization #8618

Merged

mgoin closed this as completed in #8618 Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: how to use openai compatible api to run GGUF model? #8401

[Usage]: how to use openai compatible api to run GGUF model? #8401

weiminw commented Sep 12, 2024

bhavnicksm commented Sep 12, 2024

IgorBeHolder commented Sep 15, 2024

paolovic commented Sep 16, 2024 •

edited

Loading

bhavnicksm commented Sep 17, 2024 •

edited

Loading

paolovic commented Sep 17, 2024 •

edited

Loading

[Usage]: how to use openai compatible api to run GGUF model? #8401

[Usage]: how to use openai compatible api to run GGUF model? #8401

Comments

weiminw commented Sep 12, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...

bhavnicksm commented Sep 12, 2024

IgorBeHolder commented Sep 15, 2024

paolovic commented Sep 16, 2024 • edited Loading

bhavnicksm commented Sep 17, 2024 • edited Loading

paolovic commented Sep 17, 2024 • edited Loading

paolovic commented Sep 16, 2024 •

edited

Loading

bhavnicksm commented Sep 17, 2024 •

edited

Loading

paolovic commented Sep 17, 2024 •

edited

Loading