-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI default plugin should support registering additional models #107
Comments
Potential solution: the OpenAI default plugin could support a - model_id: gpt-4-0613
name: gpt-4-0613
aliases: ["4-0613"] It could then extend to support external models with compatible APIs like this: - model_id: your-model
name: your-model.bin
OPENAI_API_BASE: "http://localhost:8080/" |
Relevant code from the OpenAI CLI utility: https://github.com/openai/openai-python/blob/b82a3f7e4c462a8a10fa445193301a3cefef9a4a/openai/_openai_scripts.py#L62-L75 openai.debug = True
if args.api_key is not None:
openai.api_key = args.api_key
if args.api_base is not None:
openai.api_base = args.api_base
if args.organization is not None:
openai.organization = args.organization
if args.proxy is not None:
openai.proxy = {}
for proxy in args.proxy:
if proxy.startswith('https'):
openai.proxy['https'] = proxy
elif proxy.startswith('http'):
openai.proxy['http'] = proxy |
I don't like how those look like global variables on From this code it looks like there's a way to avoid that: class APIRequestor:
def __init__(
self,
key=None,
api_base=None,
api_type=None,
api_version=None,
organization=None,
):
self.api_base = api_base or openai.api_base
self.api_key = key or util.default_api_key()
self.api_type = (
ApiType.from_str(api_type)
if api_type
else ApiType.from_str(openai.api_type)
)
self.api_version = api_version or openai.api_version
self.organization = organization or openai.organization Which lead me to: https://github.com/openai/openai-python/blob/b82a3f7e4c462a8a10fa445193301a3cefef9a4a/openai/openai_object.py#L11-L39 class OpenAIObject(dict):
api_base_override = None
def __init__(
self,
id=None,
api_key=None,
api_version=None,
api_type=None,
organization=None,
response_ms: Optional[int] = None,
api_base=None,
engine=None,
**params,
):
super(OpenAIObject, self).__init__()
if response_ms is not None and not isinstance(response_ms, int):
raise TypeError(f"response_ms is a {type(response_ms).__name__}.")
self._response_ms = response_ms
self._retrieve_params = params
object.__setattr__(self, "api_key", api_key)
object.__setattr__(self, "api_version", api_version)
object.__setattr__(self, "api_type", api_type)
object.__setattr__(self, "organization", organization)
object.__setattr__(self, "api_base_override", api_base)
object.__setattr__(self, "engine", engine) And |
Yes, it looks like @classmethod
def create(
cls,
api_key=None,
api_base=None,
api_type=None,
request_id=None,
api_version=None,
organization=None,
**params,
): |
The structure of the stream of chunks that comes back from LocalAI isn't quite the same as the OpenAI API - it looks like this: {
"object": "chat.completion.chunk",
"model": "orca-mini-3b.ggmlv3",
"choices": [
{
"delta": {
"role": "assistant"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
{
"object": "chat.completion.chunk",
"model": "orca-mini-3b.ggmlv3",
"choices": [
{
"delta": {
"content": " Hello"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
} Then at the end: {
"object": "chat.completion.chunk",
"model": "orca-mini-3b.ggmlv3",
"choices": [
{
"finish_reason": "stop",
"delta": {}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
} This doesn't fit the expected shape, especially for this code: llm/llm/default_plugins/openai_models.py Lines 202 to 224 in 58d1f92
It's missing the |
This almost works: diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index c6d74c5..002f899 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -8,6 +8,7 @@ from pydantic import field_validator, Field
import requests
from typing import List, Optional, Union
import json
+import yaml
@hookimpl
@@ -16,6 +17,21 @@ def register_models(register):
register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
register(Chat("gpt-4"), aliases=("4", "gpt4"))
register(Chat("gpt-4-32k"), aliases=("4-32k",))
+ # Load extra models
+ extra_path = llm.user_dir() / "extra-openai-models.yaml"
+ if not extra_path.exists():
+ return
+ with open(extra_path) as f:
+ extra_models = yaml.safe_load(f)
+ for model in extra_models:
+ model_id = model["model_id"]
+ aliases = model.get("aliases", [])
+ model_name = model["model_name"]
+ api_base = model.get("api_base")
+ register(
+ Chat(model_id, model_name=model_name, api_base=api_base),
+ aliases=aliases,
+ )
@hookimpl
@@ -141,9 +157,11 @@ class Chat(Model):
return validated_logit_bias
- def __init__(self, model_id, key=None):
+ def __init__(self, model_id, key=None, model_name=None, api_base=None):
self.model_id = model_id
self.key = key
+ self.model_name = model_name
+ self.api_base = api_base
def __str__(self):
return "OpenAI Chat: {}".format(self.model_id)
@@ -169,13 +187,17 @@ class Chat(Model):
messages.append({"role": "system", "content": prompt.system})
messages.append({"role": "user", "content": prompt.prompt})
response._prompt_json = {"messages": messages}
+ kwargs = dict(not_nulls(prompt.options))
+ if self.api_base:
+ kwargs["api_base"] = self.api_base
+ if self.key:
+ kwargs["api_key"] = self.key
if stream:
completion = openai.ChatCompletion.create(
- model=prompt.model.model_id,
+ model=self.model_name or self.model_id,
messages=messages,
stream=True,
- api_key=self.key,
- **not_nulls(prompt.options),
+ **kwargs,
)
chunks = []
for chunk in completion:
@@ -186,10 +208,10 @@ class Chat(Model):
response.response_json = combine_chunks(chunks)
else:
completion = openai.ChatCompletion.create(
- model=prompt.model.model_id,
+ model=self.model_name or self.model_id,
messages=messages,
- api_key=self.key,
stream=False,
+ **kwargs,
)
response.response_json = completion.to_dict_recursive()
yield completion.choices[0].message.content
@@ -209,11 +231,11 @@ def combine_chunks(chunks: List[dict]) -> dict:
role = choice["delta"]["role"]
if "content" in choice["delta"]:
content += choice["delta"]["content"]
- if choice["finish_reason"] is not None:
+ if choice.get("finish_reason") is not None:
finish_reason = choice["finish_reason"]
return {
- "id": chunks[0]["id"],
+ "id": chunks[0].get("id") or "no-id",
"object": chunks[0]["object"],
"model": chunks[0]["model"],
"created": chunks[0]["created"], I put this in - model_id: orca-openai-compat
model_name: orca-mini-3b.ggmlv3
api_base: "http://localhost:8080" Then ran this: llm -m 'orca-openai-compat' 'Say hello in french' And got back:
|
|
Idea: a command which hits the models API for a custom endpoint and writes out a cached file recording those models so they can show up automatically as registered models. Example from LocalAI: curl http://localhost:8080/v1/models | jq {
"object": "list",
"data": [
{
"id": "ggml-gpt4all-j",
"object": "model"
},
{
"id": "orca-mini-3b.ggmlv3",
"object": "model"
}
]
} |
Maybe it's OK to hit that endpoint every time the LLM command runs rather than messing around with caching. I don't want to hit that endpoint URL on localhost every time I use |
Got this working. In - model_id: orca-openai-compat
model_name: orca-mini-3b.ggmlv3
api_base: "http://localhost:8080" Then: llm -m orca-openai-compat '3 names for a pet cow'
llm -c '2 more with descriptions'
llm logs -n 1 [
{
"id": "01h5d4nthj4ncdntz2ap56ffz5",
"model": "orca-openai-compat",
"prompt": "2 more with descriptions",
"system": null,
"prompt_json": {
"messages": [
{
"role": "user",
"content": "3 names for a pet cow"
},
{
"role": "assistant",
"content": " I can do that! Here are three different names for a pet cow: \n1. Milo 2. Daisy 3. Max"
},
{
"role": "user",
"content": "2 more with descriptions"
}
]
},
"options_json": {},
"response": " Thank you for your prompt service! Here are two more options for a pet cow's name:\n\n1. Lily - She's gentle and kind, just like a lily.\n2. Thunder - He's strong and fierce, just like thunderstorms on a summer day.",
"response_json": {
"content": " Thank you for your prompt service! Here are two more options for a pet cow's name:\n\n1. Lily - She's gentle and kind, just like a lily.\n2. Thunder - He's strong and fierce, just like thunderstorms on a summer day.",
"role": "assistant",
"finish_reason": "stop",
"object": "chat.completion.chunk",
"model": "orca-mini-3b.ggmlv3"
},
"conversation_id": "01h5d4my74mqyjxc24fhcf86ry",
"duration_ms": 8729,
"datetime_utc": "2023-07-15T16:03:17.655636",
"conversation_name": "3 names for a pet cow",
"conversation_model": "orca-openai-compat"
}
] |
Needs documentation and tests. |
While thinking about:
I realized that there's a limitation hard-coded into LLM here:
llm/llm/default_plugins/openai_models.py
Lines 13 to 18 in 3f1388a
What if OpenAI release a new model with a new name, like they did with
gpt-4-0613
- at the moment, there's no way to use that in LLM without releasing a new version of the software (or using a custom plugin).The text was updated successfully, but these errors were encountered: