Local models #105

mudler · 2023-04-10T10:02:20Z

Hey 👋 !

Awesome project!

I'm trying to run chatgpt-web with llama.cpp, I've created a project using golang lama.cpp bindings https://github.com/go-skynet/llama-cli which mimics the OpenAI API to be 1:1 compatible but having multi-model that can run locally instead.

It seems all to work so far, and I'd like to document the usage to have them both working together, so to use it with local models. However, I'm struggling as chatgpt-web seems to filter the model from the API with openAI available models - llama-cli returns a list of models but the filtering chatgpt-web is doing prevents to select models from the list. (e.g. alpaca can't be run unless I do some hardwiring on the API).

If you want to test it, you need to run llama-cli from the latest image built from master, like so:

./llama-cli api --address 0.0.0.0:8080 --models-path models-path-here --threads 14

And set the VITE_API_BASE accordingly in the .env file.

It would be super-cool if could work together to have the capability to load local models, maybe directly adding options to run it aside with docker-compose (that's what I'm currently doing!) WDYT?

The text was updated successfully, but these errors were encountered:

Niek · 2023-04-11T19:45:07Z

Thanks! llama-cli with the API addition sounds like a great match with ChatGPT-web!
The models don't work because we hard-code explicit supported models:

chatgpt-web/src/lib/Types.svelte

Lines 2 to 9 in 1926f7d

    
             export const supportedModels = [ // See: https://platform.openai.com/docs/models/model-endpoint-compatibility 
        
               'gpt-4', 
        
               'gpt-4-0314', 
        
               'gpt-4-32k', 
        
               'gpt-4-32k-0314', 
        
               'gpt-3.5-turbo', 
        
               'gpt-3.5-turbo-0301' 
        
             ]

This can be quite easily fixed though. I guess we should support everything with ggml and assume a $0 cost for these models. The model selection need some work in any case. I tested with ggml-vicuna-7b-4bit and it worked well, although the output was gibberish.

Are you planning on adding streaming support to the API as well (using EventSource/SSE)?

mudler · 2023-04-11T22:18:49Z

Thanks! llama-cli with the API addition sounds like a great match with ChatGPT-web! The models don't work because we hard-code explicit supported models:

chatgpt-web/src/lib/Types.svelte

Lines 2 to 9 in 1926f7d

export const supportedModels = [ // See: https://platform.openai.com/docs/models/model-endpoint-compatibility

'gpt-4',

'gpt-4-0314',

'gpt-4-32k',

'gpt-4-32k-0314',

'gpt-3.5-turbo',

'gpt-3.5-turbo-0301'

]

This can be quite easily fixed though. I guess we should support everything with ggml and assume a $0 cost for these models. The model selection need some work in any case.

Yup, managed to find that bit, so I was wondering what direction to take ( I don't like forking! ), however that sounds good here! I'd be more than happy then to provide a docker-compose file as well in llama-cli to redirect the users directly to chatgpt-web!

I tested with ggml-vicuna-7b-4bit and it worked well, although the output was gibberish.

It needs a prompt to be injected in each call, I've just updated the docs on the API to achieve that!
https://github.com/go-skynet/llama-cli#web-interface : TLDR; just add a corresponding "model-file-name.bin.tmpl" with the default prompt, for instance:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{{.Input}}

### Response:

(but for vicuna/chat I think it would be slightly different)

Are you planning on adding streaming support to the API as well (using EventSource/SSE)?

This comes with a high computational cost, so I'm not really going into that direction for now - CGO calls are really expensive, and if we want to stream token-by-token by calling behind the scene C functions direcly in go, that will likely bump response time by quite a lot.

mkellerman · 2023-04-12T03:31:56Z

Guys, i just wanna say thanks! This is a beautiful collaboration between two amazing projects!

mkellerman · 2023-04-12T03:36:15Z

In response to the models, i think we need to let the user add endpoints, instead of a since 'openai' url.

You want to use openai/gpt-4, you select the model from the drop down, and hit [+] to add a custom endpoint. and a custom return object.

And just give enough info in the docs on how to POST/GET from the custom endpoints.

mudler · 2023-04-12T20:57:14Z

Re: token streaming JFYI is being tracked on go-skynet/go-llama.cpp#4, however I still think that would incur in a high computational cost decreasing the overall performance, but I'll be glad to take a stab at it next.

Niek mentioned this issue Apr 11, 2023

feature: add llama api using local models #108

Closed

mudler mentioned this issue Apr 11, 2023

Feature Request: mimic openai API endpoints mudler/LocalAI#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local models #105

Local models #105

mudler commented Apr 10, 2023

Niek commented Apr 11, 2023

mudler commented Apr 11, 2023

mkellerman commented Apr 12, 2023

mkellerman commented Apr 12, 2023

mudler commented Apr 12, 2023

Local models #105

Local models #105

Comments

mudler commented Apr 10, 2023

Niek commented Apr 11, 2023

mudler commented Apr 11, 2023

mkellerman commented Apr 12, 2023

mkellerman commented Apr 12, 2023

mudler commented Apr 12, 2023