function_call support for LLMs/llama_cpp_python #397
Replies: 8 comments 3 replies
-
Here an experiment I just did, the LLM I used is tulu-30b.ggmlv3.q5_1.bin with llama_cpp_python as server. system: Step 1:
Note: I tried Step 2:I call the realtime() and get a text back, I insert it into the conversation:
this resulted in this conversation:
Entire flow:
FindingsIt seems the term I first tried with 13B LLM, but that not even requested the function call properly as instruction in the This approach may not be sufficient to implement |
Beta Was this translation helpful? Give feedback.
-
And another test with system: Step 1:
I call current_weather() and giving Step 2:
|
Beta Was this translation helpful? Give feedback.
-
I was wondering what the status is on this? I've had to implement a work-around to integrate the API's into something coherent. This led to me creating my own data types, effectively extending the class ChatCompletionMessage(TypedDict):
"""
Base chat completion message class.
Attributes:
role (Literal["assistant", "user", "system"]): The role of the message.
content (str): The content of the message.
user (NotRequired[str]): The user associated with the message (optional).
"""
role: Literal["assistant", "user", "system"]
content: str
user: NotRequired[str]
class ChatModelChatCompletion(ChatCompletionMessage):
"""
Extended chat completion message with additional role options.
Inherits:
ChatCompletionMessage: Base chat completion message class.
Attributes:
role (Literal["assistant", "user", "system", "function"]): The role of the message.
content (str): The content of the message.
function_call (NotRequired[str]): The function call associated with the message (optional).
function_args (NotRequired[str]): The function arguments associated with the message (optional).
user (NotRequired[str]): The user associated with the message (optional).
"""
role: Literal["assistant", "user", "system", "function"]
content: NotRequired[str]
function_call: NotRequired[str]
function_args: NotRequired[str]
user: NotRequired[str] I've played around with a few, but I just reviewed the updated source and actually felt that it was appropriately named. class ChatCompletionMessage(TypedDict):
role: Literal["assistant", "user", "system"]
content: str
user: NotRequired[str]
class ChatCompletionFunction(TypedDict):
name: str
description: NotRequired[str]
parameters: Dict[str, Any] # TODO: make this more specific
class ChatCompletionFunctionCall(TypedDict):
name: str Maybe class ChatCompletionFunction(ChatCompletionMessage):
"""
Extended chat completion message with additional role options.
Inherits:
ChatCompletionMessage: Base chat completion message class.
Attributes:
role (Literal["assistant", "user", "system", "function"]): The role of the message.
content (str): The content of the message.
function_call (NotRequired[str]): The function call associated with the message (optional).
function_args (NotRequired[str]): The function arguments associated with the message (optional).
user (NotRequired[str]): The user associated with the message (optional).
"""
role: Literal["assistant", "user", "system", "function"]
content: NotRequired[str]
function_call: NotRequired[str]
function_args: NotRequired[str]
user: NotRequired[str] I ended up treating it like an extension because it was easier to do it this way in practice. It also creates a direct 1:1 compatibility with the OpenAI and llama-cpp-python API's without messing with anything else. I needed to keep it clean, so I just extract the content. def _extract_content(self, delta: dict, content: str) -> str:
"""
Extracts content from the given delta and appends it to the existing content.
Args:
delta (dict): The delta object containing new content.
content (str): The existing content.
Returns:
str: The updated content after appending the new token.
"""
if delta and "content" in delta and delta["content"]:
token = delta["content"]
print(token, end="")
sys.stdout.flush()
content += token
return content Then I extract the function call. def _extract_function_call(
self,
delta: dict,
function_call_name: str,
function_call_args: str,
) -> Tuple[str, str]:
"""
Extracts function call information from the given delta and updates the function call name and arguments.
Args:
delta (dict): The delta object containing function call information.
function_call_name (str): The existing function call name.
function_call_args (str): The existing function call arguments.
Returns:
Tuple[str, str]: A tuple containing the updated function call name and arguments.
"""
if delta and "function_call" in delta and delta["function_call"]:
function_call = delta["function_call"]
if not function_call_name:
function_call_name = function_call.get("name", "")
function_call_args += str(function_call.get("arguments", ""))
return function_call_name, function_call_args With these 2 things, I just attempt to discover the finish reason. def _handle_finish_reason(
self,
finish_reason: str,
function_call_name: str,
function_call_args: str,
content: str,
) -> ChatModelChatCompletion:
"""
Handles the finish reason and returns an ChatModelChatCompletion.
Args:
finish_reason (str): The finish reason from the response.
function_call_name (str): The function call name.
function_call_args (str): The function call arguments.
content (str): The generated content.
Returns:
ChatModelChatCompletion (Dict[LiteralString, str]): The model's response as a message.
"""
if finish_reason:
if finish_reason == "function_call":
return ChatModelChatCompletion(
role="function",
function_call=function_call_name,
function_args=function_call_args,
)
elif finish_reason == "stop":
print() # Add newline to model output
sys.stdout.flush()
return ChatModelChatCompletion(role="assistant", content=content)
else:
# Handle unexpected finish_reason
raise ValueError(f"Warning: Unexpected finish_reason '{finish_reason}'") Then I just piece it all together. def _stream_chat_completion(
self, response_generator: Iterator[ChatCompletionChunk]
) -> ChatModelChatCompletion:
"""
Streams the chat completion response and handles the content and function call information.
Args:
response_generator (Iterator[ChatCompletionChunk]): An iterator of ChatCompletionChunk objects.
Returns:
ChatModelChatCompletion (Dict[LiteralString, str]): The model's response as a message.
"""
function_call_name = None
function_call_args = ""
content = ""
for chunk in response_generator:
delta = chunk["choices"][0]["delta"]
content = self._extract_content(delta, content)
function_call_name, function_call_args = self._extract_function_call(
delta, function_call_name, function_call_args
)
finish_reason = chunk["choices"][0]["finish_reason"]
message = self._handle_finish_reason(
finish_reason, function_call_name, function_call_args, content
)
if message:
return message I needed the Llama.Cpp API to be compatible which is why I used those types in particular and doing it this way made it clear, concise, and straightforward. That way when function calling became available, I'd be ready. Any feedback on the status of the API would be highly appreciated. I'll be looking into this more so today. |
Beta Was this translation helpful? Give feedback.
-
Not sure if this matters or is helpful, but I created a notebook for sketching out how the High-Level API might be expected to operate. I expect the details to differ though. |
Beta Was this translation helpful? Give feedback.
-
Any updates on this? |
Beta Was this translation helpful? Give feedback.
-
Just an update here, I've merged in #711 which allows users to add and extend the behaviour of the chat models. Now that that api is settled I'm shifting my attention back to this. |
Beta Was this translation helpful? Give feedback.
-
Some combination of these approaches is necessary. The system prompt may require information on how to interpret, use, and call a function. The completions chain may require including information about the function(s). e.g. A list of function descriptions in JSON format. e.g. The functions argument consumes contextual space in the models sequence. It may require the use of custom prompts underneath for the model to follow, e.g. Assuming we might need a LoRA adapter for this combined with BNF for JSON format. This might lead to a "hack-y" variation for it. I agree that the models probably require some form of pre-training and/or fine-tuning for it. This is why I'm thinking a LoRA adapter might be a good fit for this scenario. I'm planning on creating a dataset for just this kind of thing. The tricky bit is whether the model invokes a function call automatically or not based on the given criteria. Regardless, I got a working prototype up and running after experimenting for a few hours with it. I'll post the notebook once I iron it out. No fine-tuning or LoRA adapter needed either. It simply uses the grammar in isolation. |
Beta Was this translation helpful? Give feedback.
-
Yup, I was able to get it working and make it completely reproducible on a consistent basis. You can look at the notebook here. I'm still going to work on a Dataset and LoRA adapter for this. I should have it done in the next week or so; I'm thinking a week, so it'll probably be 2 weeks. |
Beta Was this translation helpful? Give feedback.
-
Regarding #373 - I have been experimenting with stuff like this, with llama-cpp-python, gpt-3/4 and coding "plugins" before
function_call
came out:function_call (openai only for now)
Only played a few days since it came out:
Use cases:
python
function call, which executes any python code, orfile_system
function call to allow create, append, delete files, make dirs, delete dirs and scan dirs (this allows to create apps with multiple files within a single chatbot session: "make me a population.csv with a list of countries and their respective population"python
function is used, to create this population.csv, something likecountries = ["China", "India", ...] ... writing_csv("population.csv",data)
file_system
is used with a singlecreate_file("population.csv","Country,Population\nChina,....")
pre- & post-plugin
This what I experimented the past few weeks with llama_cpp_python and gpt3.5/4:
I allowed the plugin to pre- or post-processing the content.
Use cases:
link_reader
which recognizes https://.... and replaces the link with a summary of the web-page content (I actually called LLM here recursively, instructing it "summarize "+webpage_as_text) which allows "tell me about https://openai.com/blog"Implementation Details:
system
instruction, e.g. "Whenever you are asked to output a graph, use python and start the code with ```python, only output PNG, don't use .show() or anything else."When function_call came out, I was able to reuse most of my plugin infrastructure, but now it's obvious plugins/functions can be used:
Implementing function_call for LLMs in general
Based on my experiences with gpt-3.5 and gpt-4 I can say following:
system
instructions affected the function argument as wellEssentially, it's like instructing a person with different options to achieve the same, but some of the processes cannot be intermixed, in particular the construction of the arguments to pass to the function.
I'm aware this is quite vague what I describe, but I thought it helps to implement
function_call
for LLMs in general or llama_cpp_python in particular.OpenAI plugin detail
Although, this is about OpenAI plugin, but it may explain a bit how function_call is implemented:
Users begin a conversation
Here my guess, part of the extended length of tokens of recently gpt-3.5/4 update is about support multiple function_call, and also the reason why not all OpenAI plugins but only 3 at the same time are active: each plugin and each function_call adds instructions to the flow of the chat.
So, to further push this - here my early thoughts how to implement it:
"You are given a functionality which does <insert description>, if you need it, state so by using ..."
"Always and under any circumstances compose the data in JSON given this layout: <insert parameter description>"
.name
is mentioned, but thearguments
is empty)Instructions & Non-Determinism
In my experiments most models with llama have been very flaky with system instructions not coming near gpt-3.5 quality, whereas gpt-3.5 and gpt-4 gave very good results, although gpt-4 has become a bit lazy and stricter and a few times stated "I don't have access to the internet, please access this site yourself.", whereas gpt-3.5 gave me python code which tried to access and process the content of the web-site, clearly being more "helpful".
So, although function_call and pre-/post-plugins I described are powerful, the non-determinism makes is hard for reliable operations; with further maturing LLMs this might change.
References
Beta Was this translation helpful? Give feedback.
All reactions