function_call support for LLMs/llama_cpp_python #397

Spiritdude · 2023-06-18T19:51:33Z

Spiritdude
Jun 18, 2023

Regarding #373 - I have been experimenting with stuff like this, with llama-cpp-python, gpt-3/4 and coding "plugins" before function_call came out:

function_call (openai only for now)

Only played a few days since it came out:

user
assistant:requesting function_call
function_call
assistant:final answer

Use cases:

extremely powerful, e.g. you can do a python function call, which executes any python code, or file_system function call to allow create, append, delete files, make dirs, delete dirs and scan dirs (this allows to create apps with multiple files within a single chatbot session: "make me a population.csv with a list of countries and their respective population"
- sometimes python function is used, to create this population.csv, something like countries = ["China", "India", ...] ... writing_csv("population.csv",data)
- other times my file_system is used with a single create_file("population.csv","Country,Population\nChina,....")
- it's truly non-deterministic which function call is used with gpt-3.5 or gpt-4

pre- & post-plugin

This what I experimented the past few weeks with llama_cpp_python and gpt3.5/4:

user
plugin pre-processing prompt
assistant:answer
plugin post-processing answer

I allowed the plugin to pre- or post-processing the content.

Use cases:

pre-processing: link_reader which recognizes https://.... and replaces the link with a summary of the web-page content (I actually called LLM here recursively, instructing it "summarize "+webpage_as_text) which allows "tell me about https://openai.com/blog"
post-processing: my use case has been, execute any ```python code; and I instructed the LLM whenever I ask for a graph, that it should use python to draw the graph and export PNG only, after I executed the python automatically, I retrieve the PNG and display in my terminal (with graphics capability aka sixel) -> "give me a bar graph of population per countries".

Implementation Details:

each plugin gives its own instruction which becomes part of first system instruction, e.g. "Whenever you are asked to output a graph, use python and start the code with ```python, only output PNG, don't use .show() or anything else."
for pre- or post-processing I look for a trigger word (normal substr or regex) and then process the content of the message.

When function_call came out, I was able to reuse most of my plugin infrastructure, but now it's obvious plugins/functions can be used:

pre-processing
co-processing aka function call
post-processing

Implementing function_call for LLMs in general

Based on my experiences with gpt-3.5 and gpt-4 I can say following:

the processing of function calls is done with an LLM, it's non-deterministic
- the data feed back to me for the function isn't always JSON, but sometimes also a string even though I defined a JSON object, a few times system instructions affected the function argument as well

Essentially, it's like instructing a person with different options to achieve the same, but some of the processes cannot be intermixed, in particular the construction of the arguments to pass to the function.

I'm aware this is quite vague what I describe, but I thought it helps to implement function_call for LLMs in general or llama_cpp_python in particular.

OpenAI plugin detail

Although, this is about OpenAI plugin, but it may explain a bit how function_call is implemented:

Users begin a conversation

OpenAI will inject a compact description of your plugin in a message to ChatGPT, invisible to end users. This will include the plugin description, endpoints, and examples.
When a user asks a relevant question, the model may choose to invoke an API call from your plugin if it seems relevant; for POST requests, we require that developers build a user confirmation flow to avoid destruction actions.
The model will incorporate the API call results into its response to the user.
The model might include links returned from the API calls in its response. These will be displayed as rich previews (using the OpenGraph protocol, where we pull the site_name, title, description, image, and url fields).
The model can also format data from your API in markdown and the ChatGPT UI will render the markdown automatically.

Here my guess, part of the extended length of tokens of recently gpt-3.5/4 update is about support multiple function_call, and also the reason why not all OpenAI plugins but only 3 at the same time are active: each plugin and each function_call adds instructions to the flow of the chat.

So, to further push this - here my early thoughts how to implement it:

some kind of notion to call a function which is not used otherwise by user or assistant, and this notion is used to form an system instruction "You are given a functionality which does <insert description>, if you need it, state so by using ..."
the arguments composition is based on JSON data format, and it seems for now only object type, "Always and under any circumstances compose the data in JSON given this layout: <insert parameter description>".
chat.completion without streaming is required (as of 2023/06/18), streaming wasn't supported with function_call (the function name is mentioned, but the arguments is empty)

Instructions & Non-Determinism

In my experiments most models with llama have been very flaky with system instructions not coming near gpt-3.5 quality, whereas gpt-3.5 and gpt-4 gave very good results, although gpt-4 has become a bit lazy and stricter and a few times stated "I don't have access to the internet, please access this site yourself.", whereas gpt-3.5 gave me python code which tried to access and process the content of the web-site, clearly being more "helpful".

So, although function_call and pre-/post-plugins I described are powerful, the non-determinism makes is hard for reliable operations; with further maturing LLMs this might change.

References

https://platform.openai.com/docs/api-reference/chat/create#chat/create-functions
https://github.com/unconv/gpt-pydecorator making it easy to use function_call in python

Spiritdude · 2023-06-20T11:13:21Z

Spiritdude
Jun 20, 2023
Author

Here an experiment I just did, the LLM I used is tulu-30b.ggmlv3.q5_1.bin with llama_cpp_python as server.

system: "If you need current date or time respond with REQ_FUNCTION_CALL:realtime() instead. If you see realtime's response later, consider it in your response."

Step 1:

user: "what date is it?"
assistant: "REQ_FUNCTION_CALL:realtime()"

Note: I tried REQUEST_FUNCTION_CALL instead of REQ_FUNCTION_CALL and the LLM responded with realtime, it took REQUEST_FUNCTION_CALL as request function call, so we likely need wording which cannot be reduced.

Step 2:

I call the realtime() and get a text back, I insert it into the conversation:

but instead to add it, I inserted it above the question, just adding confused the conversation
the request to call realtime() I don't keep in the conversation, keeping it confused the conversation as well

this resulted in this conversation:

assistant: "RESPONSE FUNCTION_CALL of realtime: 2023-06-20T12:48:58.940325"
user: "what date is it?"
assistant: "It's June 20, 2023 at 12:48 PM."
user: "what time is it?"
assistant: "It's 12:48 PM."
user: "what date is today?"
assistant: "Today's date is June 20, 2023."

Entire flow:

   { 
      "role": "system",
      "content": "If you need current date or time respond with REQ_FUNCTION_CALL:realtime() instead. \
         If you see realtime's response later, consider it in your response.\n"
   },
   {
      "role": "assistant",
      "content": "RESPONSE FUNCTION_CALL of realtime: 2023-06-20T12:48:58.940325",
      "name": "realtime"
   },
   {
      "role": "user",
      "content": "what date is it?"
   },
   {
      "role": "assistant",
      "content": "It's June 20, 2023 at 12:48 PM."
   },
   {
      "role": "user",
      "content": "what time is it?"
   },
   {
      "role": "assistant",
      "content": "It's 12:48 PM."
   },
   {
      "role": "user",
      "content": "what date is today?"
   },
   { 
     "role": "assistant",
     "content": "Today's date is June 20, 2023."
   }

Findings

It seems the term realtime in this case was sufficient to tie the response above the question to answer the 1st question of "what date is it?", this may vary for other LLMs. This may not be ideal, if the function name is less informative or unique and is misunderstood as part of the conversation and not considered as instruction.

I first tried with 13B LLM, but that not even requested the function call properly as instruction in the system.

This approach may not be sufficient to implement function_call, or may be even the wrong approach, I just post this as reference.

0 replies

Spiritdude · 2023-06-20T14:25:01Z

Spiritdude
Jun 20, 2023
Author

And another test with current_weather function call using the same wording as above, but adding JSON schema request for location:

system: "If you need current weather information respond with REQ_FUNCTION_CALL:current_weather({\"location\": location}) instead. If you see current_weather's response later, consider it in your response."

Step 1:

user: "what is the weather in Zug, Switzerland?"
assistant: "REQ_FUNCTION_CALL:current_weather({\"location\": \"Zug, Switzerland\"})"

I call current_weather() and giving "It's {condition} at {temperature}C at {location}" like "It's sunny and 20C at Zug, Switzerland"

Step 2:

assistant: "RESPONSE FUNCTION_CALL of current_weather: It's sunny and 20C at Zug, Switzerland"
user: "what is the weather in Zug, Switzerland?"
assistant: "The weather at Zug, Switzerland is sunny with a temperature of 20 degrees Celsius."

0 replies

teleprint-me · 2023-09-15T18:13:40Z

teleprint-me
Sep 15, 2023

@abetlen @Spiritdude

I was wondering what the status is on this?

I've had to implement a work-around to integrate the API's into something coherent. This led to me creating my own data types, effectively extending the ChatCompletionMessage data type.

class ChatCompletionMessage(TypedDict):
    """
    Base chat completion message class.

    Attributes:
        role (Literal["assistant", "user", "system"]): The role of the message.
        content (str): The content of the message.
        user (NotRequired[str]): The user associated with the message (optional).
    """

    role: Literal["assistant", "user", "system"]
    content: str
    user: NotRequired[str]


class ChatModelChatCompletion(ChatCompletionMessage):
    """
    Extended chat completion message with additional role options.

    Inherits:
        ChatCompletionMessage: Base chat completion message class.

    Attributes:
        role (Literal["assistant", "user", "system", "function"]): The role of the message.
        content (str): The content of the message.
        function_call (NotRequired[str]): The function call associated with the message (optional).
        function_args (NotRequired[str]): The function arguments associated with the message (optional).
        user (NotRequired[str]): The user associated with the message (optional).
    """

    role: Literal["assistant", "user", "system", "function"]
    content: NotRequired[str]
    function_call: NotRequired[str]
    function_args: NotRequired[str]
    user: NotRequired[str]

I've played around with a few, but I just reviewed the updated source and actually felt that it was appropriately named.

class ChatCompletionMessage(TypedDict):
    role: Literal["assistant", "user", "system"]
    content: str
    user: NotRequired[str]


class ChatCompletionFunction(TypedDict):
    name: str
    description: NotRequired[str]
    parameters: Dict[str, Any]  # TODO: make this more specific


class ChatCompletionFunctionCall(TypedDict):
    name: str

Maybe ChatCompletionFunction is more appropriate?

class ChatCompletionFunction(ChatCompletionMessage):
    """
    Extended chat completion message with additional role options.

    Inherits:
        ChatCompletionMessage: Base chat completion message class.

    Attributes:
        role (Literal["assistant", "user", "system", "function"]): The role of the message.
        content (str): The content of the message.
        function_call (NotRequired[str]): The function call associated with the message (optional).
        function_args (NotRequired[str]): The function arguments associated with the message (optional).
        user (NotRequired[str]): The user associated with the message (optional).
    """

    role: Literal["assistant", "user", "system", "function"]
    content: NotRequired[str]
    function_call: NotRequired[str]
    function_args: NotRequired[str]
    user: NotRequired[str]

I ended up treating it like an extension because it was easier to do it this way in practice.

It also creates a direct 1:1 compatibility with the OpenAI and llama-cpp-python API's without messing with anything else.

I needed to keep it clean, so I just extract the content.

    def _extract_content(self, delta: dict, content: str) -> str:
        """
        Extracts content from the given delta and appends it to the existing content.

        Args:
            delta (dict): The delta object containing new content.
            content (str): The existing content.

        Returns:
            str: The updated content after appending the new token.
        """
        if delta and "content" in delta and delta["content"]:
            token = delta["content"]
            print(token, end="")
            sys.stdout.flush()
            content += token
        return content

Then I extract the function call.

    def _extract_function_call(
        self,
        delta: dict,
        function_call_name: str,
        function_call_args: str,
    ) -> Tuple[str, str]:
        """
        Extracts function call information from the given delta and updates the function call name and arguments.

        Args:
            delta (dict): The delta object containing function call information.
            function_call_name (str): The existing function call name.
            function_call_args (str): The existing function call arguments.

        Returns:
            Tuple[str, str]: A tuple containing the updated function call name and arguments.
        """
        if delta and "function_call" in delta and delta["function_call"]:
            function_call = delta["function_call"]
            if not function_call_name:
                function_call_name = function_call.get("name", "")
            function_call_args += str(function_call.get("arguments", ""))
        return function_call_name, function_call_args

With these 2 things, I just attempt to discover the finish reason.

    def _handle_finish_reason(
        self,
        finish_reason: str,
        function_call_name: str,
        function_call_args: str,
        content: str,
    ) -> ChatModelChatCompletion:
        """
        Handles the finish reason and returns an ChatModelChatCompletion.

        Args:
            finish_reason (str): The finish reason from the response.
            function_call_name (str): The function call name.
            function_call_args (str): The function call arguments.
            content (str): The generated content.

        Returns:
            ChatModelChatCompletion (Dict[LiteralString, str]): The model's response as a message.
        """
        if finish_reason:
            if finish_reason == "function_call":
                return ChatModelChatCompletion(
                    role="function",
                    function_call=function_call_name,
                    function_args=function_call_args,
                )
            elif finish_reason == "stop":
                print()  # Add newline to model output
                sys.stdout.flush()
                return ChatModelChatCompletion(role="assistant", content=content)
            else:
                # Handle unexpected finish_reason
                raise ValueError(f"Warning: Unexpected finish_reason '{finish_reason}'")

Then I just piece it all together.

    def _stream_chat_completion(
        self, response_generator: Iterator[ChatCompletionChunk]
    ) -> ChatModelChatCompletion:
        """
        Streams the chat completion response and handles the content and function call information.

        Args:
            response_generator (Iterator[ChatCompletionChunk]): An iterator of ChatCompletionChunk objects.

        Returns:
            ChatModelChatCompletion (Dict[LiteralString, str]): The model's response as a message.
        """
        function_call_name = None
        function_call_args = ""
        content = ""

        for chunk in response_generator:
            delta = chunk["choices"][0]["delta"]

            content = self._extract_content(delta, content)
            function_call_name, function_call_args = self._extract_function_call(
                delta, function_call_name, function_call_args
            )

            finish_reason = chunk["choices"][0]["finish_reason"]
            message = self._handle_finish_reason(
                finish_reason, function_call_name, function_call_args, content
            )

            if message:
                return message

I needed the Llama.Cpp API to be compatible which is why I used those types in particular and doing it this way made it clear, concise, and straightforward.

That way when function calling became available, I'd be ready.

Any feedback on the status of the API would be highly appreciated. I'll be looking into this more so today.

0 replies

teleprint-me · 2023-09-15T23:06:06Z

teleprint-me
Sep 15, 2023

Not sure if this matters or is helpful, but I created a notebook for sketching out how the High-Level API might be expected to operate. I expect the details to differ though.

llama_cpp_function_api

0 replies

N3RDIUM · 2023-09-22T13:32:36Z

N3RDIUM
Sep 22, 2023

Any updates on this?
I am working on a project using this library, and it outperforms the other alternatives. I need some way for the LLM to call a function when required.

0 replies

abetlen · 2023-09-30T00:14:09Z

abetlen
Sep 30, 2023
Maintainer

Just an update here, I've merged in #711 which allows users to add and extend the behaviour of the chat models. Now that that api is settled I'm shifting my attention back to this.

1 reply

abetlen Sep 30, 2023
Maintainer

@Spiritdude great points above, some thoughts:

The function_call argument can be used to specify a grammar in the case that it's set to a specific function name.
Wrt to nondeterminism this is correct, just imposing a grammar and some additional rules on schema isn't enough and the prompt itself must be in a format the model was finetuned on. So far functionary is the only model I've been able to find that's explicitly trained on function specifications in it's prompt but if anyone has any other suggestions I would be happy to add those to the llama_chat_format.py file.

teleprint-me · 2023-10-04T01:58:16Z

teleprint-me
Oct 4, 2023

Some combination of these approaches is necessary.

The system prompt may require information on how to interpret, use, and call a function.

The completions chain may require including information about the function(s). e.g. A list of function descriptions in JSON format. e.g. The functions argument consumes contextual space in the models sequence.

It may require the use of custom prompts underneath for the model to follow, e.g. [FUNC] instead of [INST]. It could be as simple as instructing the model in just the right way.

Assuming we might need a LoRA adapter for this combined with BNF for JSON format. This might lead to a "hack-y" variation for it.

I agree that the models probably require some form of pre-training and/or fine-tuning for it. This is why I'm thinking a LoRA adapter might be a good fit for this scenario. I'm planning on creating a dataset for just this kind of thing.

The tricky bit is whether the model invokes a function call automatically or not based on the given criteria.

Regardless, I got a working prototype up and running after experimenting for a few hours with it. I'll post the notebook once I iron it out. No fine-tuning or LoRA adapter needed either. It simply uses the grammar in isolation.

0 replies

teleprint-me · 2023-10-04T02:39:24Z

teleprint-me
Oct 4, 2023

Yup, I was able to get it working and make it completely reproducible on a consistent basis.

You can look at the notebook here.

I'm still going to work on a Dataset and LoRA adapter for this. I should have it done in the next week or so; I'm thinking a week, so it'll probably be 2 weeks.

2 replies

limcheekin Nov 26, 2023

May I know any update?

teleprint-me Nov 26, 2023

@limcheekin Update on what specifically?

Function calling was already added here and here and here.

I'll be looking into this tonight and tomorrow in more detail as I've been spread thin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function_call support for LLMs/llama_cpp_python #397

{{title}}

Replies: 8 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

function_call support for LLMs/llama_cpp_python #397

function_call (openai only for now)

pre- & post-plugin

Implementing function_call for LLMs in general

OpenAI plugin detail

Users begin a conversation

Instructions & Non-Determinism

References

Replies: 8 comments · 3 replies

Spiritdude Jun 20, 2023 Author

Step 1:

Step 2:

Findings

Spiritdude Jun 20, 2023 Author

Step 1:

Step 2:

abetlen Sep 30, 2023 Maintainer

abetlen Sep 30, 2023 Maintainer

Replies: 8 comments 3 replies

Spiritdude
Jun 20, 2023
Author

Spiritdude
Jun 20, 2023
Author

abetlen
Sep 30, 2023
Maintainer

abetlen Sep 30, 2023
Maintainer