Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: how to use openai compatible api to run GGUF model? #8401

Closed
1 task done
weiminw opened this issue Sep 12, 2024 · 5 comments · Fixed by #8618
Closed
1 task done

[Usage]: how to use openai compatible api to run GGUF model? #8401

weiminw opened this issue Sep 12, 2024 · 5 comments · Fixed by #8618
Labels
usage How to use vllm

Comments

@weiminw
Copy link

weiminw commented Sep 12, 2024

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@weiminw weiminw added the usage How to use vllm label Sep 12, 2024
@bhavnicksm
Copy link

Hey @weiminw,
If you pass the GGUF Quantized model path directly to the OpenAI compatible API, it should be able to detect the quantization to use with it and run.

Let me know if you are seeing something else.

@IgorBeHolder
Copy link

Hey @weiminw, If you pass the GGUF Quantized model path directly to the OpenAI compatible API, it should be able to detect the quantization to use with it and run.

Let me know if you are seeing something else.

Hi,
I tried but

rag-vllm-1  |     raise EnvironmentError(
rag-vllm-1  | OSError: mradermacher/Nemo-12B-Marlin-v7-GGUF does not appear to have a file named config.json. Checkout 'https://huggingface.co/mradermacher/Nemo-12B-Marlin-v7-GGUF/tree/main' for available files.

GGUF format has all metadata, isnt it?
Why did vllm ask for config.json?

@paolovic
Copy link

paolovic commented Sep 16, 2024

The same holds for me as also described in here
#4416

When trying to load a GGUF model, e.g., https://huggingface.co/bartowski/reader-lm-1.5b-GGUF , vLLM requires a config.json although the new (?) GGUF quantized models come only with an .imatrix

OSError: /reader-lm-1.5b-GGUF/ does not appear to have a file named config.json. Checkout 'https://huggingface.co//u01/app/mlo/models/reader-lm-1.5b-GGUF//tree/Nles.

@bhavnicksm
Copy link

bhavnicksm commented Sep 17, 2024

Hey @paolovic,

Yes, this error occurs because vLLM is currently not looking for .gguf files inside the folder but instead assumes you pass the model as the .gguf weight path. Additionally, you would also need to pass the tokenizer as the path to the tokenizer.

I have tested this for v0.6.1.post2 and it works properly with the .gguf weights path.

llm =  LLM(model = "/path/to/model.gguf", tokenizer="/path/to/tokenizer")

cc: @IgorBeHolder @weiminw

@paolovic
Copy link

paolovic commented Sep 17, 2024

Hey @paolovic,

Yes, this error occurs because vLLM is currently not looking for .gguf files inside the folder but instead assumes you pass the model as the .gguf weight path. Additionally, you would also need to pass the tokenizer as the path to the tokenizer.

I have tested this for v0.6.1.post2 and it works properly with the .gguf weights path.

llm =  LLM(model = "/path/to/model.gguf", tokenizer="/path/to/tokenizer")

cc: @IgorBeHolder @weiminw

ahhhh....ok, easy
thank you very much, somehow I overlooked this information! @bhavnicksm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants