Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(functions): mixed JSON BNF grammars #2328

Merged
merged 1 commit into from
May 15, 2024
Merged

Conversation

mudler
Copy link
Owner

@mudler mudler commented May 15, 2024

This PR provides new options to control how functions are extracted from the LLM, and also provides more control on how JSON grammars can be used (also in conjunction).

New YAML settings introduced:

  • grammar_message: when enabled, the generated grammar can also decide to push strings and not only JSON objects. This allows the LLM to pick to either respond freely or using JSON constrained by BNF rules (which are generated on the fly by the parser).
  • grammar_prefix: Allows to prefix a string to the JSON grammar definition.
  • replace_results: Is a map that allows to replace strings in the LLM result.

As an example, consider the following settings for Hermes-2-Pro-Mistral, which allow extracting both JSON results coming from the model, and the ones coming from the grammar:

function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
# Note: make sure to not have </tool_call> as a stopword if disabling grammars

Note: To disable entirely grammars usage in the example above, uncomment the no_grammar and json_regex_match and make sure to not have </tool_call> as a stopword.

Implementation details

To achieve this - I've tweaked the already existing code that generates BNF grammars from JSON Schema objects on-the-fly, now, when grammar_message is set, it will inject a new rule of the form (pseudocode to simplify) along the other options:

result = string | JSON

The JSON responses are of the form { "name": "function_name", "arguments": <JSON parameters map> }.

The parser is then more flexible and tolerates non-JSON in responses, which are directly forwarded as LLM result. If the JSON is correctly parsed, the tool response is appended to the request.

Notes

This isn't a breaking change - configs needs to explicitly enable these options for now. When things are proven to be more stable with this methodology I plan to start to move defaults over this setup.

A full example of hermes that I've tested locally with:

context_size: 4096
f16: true
mmap: true
name: hermes-2-pro-llama-3-8b
parameters:
  model: Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
stopwords:
- <|im_end|>
# Note: comment this below if disabling grammars
- </tool_call>

function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results: 
    "<tool_call>": ""
    "\'": "\""


template:
  chat: |
    {{.Input -}}
    <|im_start|>assistant
  chat_message: |
    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
    {{- if .FunctionCall }}
    <tool_call>
    {{- else if eq .RoleName "tool" }}
    <tool_response>
    {{- end }}
    {{- if .Content}}
    {{.Content }}
    {{- end }}
    {{- if .FunctionCall}}
    {{toJson .FunctionCall}}
    {{- end }}
    {{- if .FunctionCall }}
    </tool_call>
    {{- else if eq .RoleName "tool" }}
    </tool_response>
    {{- end }}<|im_end|>
  completion: |
    {{.Input}}
  function: |
    <|im_start|>system
    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
    <tools>
    {{range .Functions}}
    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
    {{end}}
    </tools>
    Use the following pydantic model json schema for each tool call you will make:
    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
    For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
    <tool_call>
    {'arguments': <args-dict>, 'name': <function-name>}
    </tool_call><|im_end|>
    {{.Input -}}
    <|im_start|>assistant

This PR provides new options to control how functions are extracted from
the LLM, and also provides more control on how JSON grammars can be used
(also in conjunction).

New YAML settings introduced:

- `grammar_message`: when enabled, the generated grammar can also decide
  to push strings and not only JSON objects. This allows the LLM to pick
to either respond freely or using JSON.
- `grammar_prefix`: Allows to prefix a string to the JSON grammar
  definition.
- `replace_results`: Is a map that allows to replace strings in the LLM
  result.

As an example, consider the following settings for Hermes-2-Pro-Mistral,
which allow extracting both JSON results coming from the model, and the
ones coming from the grammar:

```yaml
function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
```

Note: To disable entirely grammars usage in the example above, uncomment the
`no_grammar` and `json_regex_match`.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Copy link

netlify bot commented May 15, 2024

Deploy Preview for localai canceled.

Name Link
🔨 Latest commit 534d9d9
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/6644e94aaef8040008ff51df

@mudler mudler added the enhancement New feature or request label May 15, 2024
@mudler mudler changed the title feat(functions): support mixed JSON BNF grammar feat(functions): mixed JSON BNF grammars May 15, 2024
@mudler mudler merged commit beb598e into master May 15, 2024
35 checks passed
@mudler mudler deleted the grammars_return_strings branch May 15, 2024 18:03
@mudler
Copy link
Owner Author

mudler commented May 15, 2024

merging quickly as anyone wants to test this on master images ASAP =)

truecharts-admin referenced this pull request in truecharts/public May 25, 2024
…6.0 by renovate (#22420)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [docker.io/localai/localai](https://github.com/mudler/LocalAI) |
minor | `v2.15.0-cublas-cuda11-ffmpeg-core` ->
`v2.16.0-cublas-cuda11-ffmpeg-core` |
| [docker.io/localai/localai](https://github.com/mudler/LocalAI) |
minor | `v2.15.0-cublas-cuda11-core` -> `v2.16.0-cublas-cuda11-core` |
| [docker.io/localai/localai](https://github.com/mudler/LocalAI) |
minor | `v2.15.0-cublas-cuda12-ffmpeg-core` ->
`v2.16.0-cublas-cuda12-ffmpeg-core` |
| [docker.io/localai/localai](https://github.com/mudler/LocalAI) |
minor | `v2.15.0-cublas-cuda12-core` -> `v2.16.0-cublas-cuda12-core` |
| [docker.io/localai/localai](https://github.com/mudler/LocalAI) |
minor | `v2.15.0-ffmpeg-core` -> `v2.16.0-ffmpeg-core` |
| [docker.io/localai/localai](https://github.com/mudler/LocalAI) |
minor | `v2.15.0` -> `v2.16.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>mudler/LocalAI (docker.io/localai/localai)</summary>

###
[`v2.16.0`](https://github.com/mudler/LocalAI/releases/tag/v2.16.0)

[Compare
Source](https://github.com/mudler/LocalAI/compare/v2.15.0...v2.16.0)

![local-ai-release-2
16](https://github.com/mudler/LocalAI/assets/2420543/bd3a6ace-8aec-4ac7-b457-b3e8cb5bb29e)

##### Welcome to LocalAI's latest update!

##### 🎉🎉🎉 woot woot! So excited to share this release, a lot of new
features are landing in LocalAI!!!!! 🎉🎉🎉


![](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExZ2cycjRqbXFld2toenpqcjcyN3E3eWw1NHI5cm12Njc3Y2lzZWtyZyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/AR92HqL0HcenC/giphy.gif)

##### 🌟  Introducing Distributed Llama.cpp Inferencing

Now it is possible to distribute the inferencing workload across
different workers with llama.cpp models !

This feature has landed with
[https://github.com/mudler/LocalAI/pull/2324](https://github.com/mudler/LocalAI/pull/2324)
and is based on the upstream work of
[@&#8203;rgerganov](https://github.com/rgerganov) in
[https://github.com/ggerganov/llama.cpp/pull/6829](https://github.com/ggerganov/llama.cpp/pull/6829).

**How it works:** a front-end server manages the requests compatible
with the OpenAI API (LocalAI) and workers (llama.cpp) are used to
distribute the workload. This makes possible to run larger models split
across different nodes!

##### How to use it

To start workers to offload the computation you can run:

    local-ai llamacpp-worker <listening_address> <listening_port>

However, you can also follow the llama.cpp README and building the
rpc-server
(https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md),
which is still compatible with LocalAI.

When starting the LocalAI server, which is going to accept the API
requests, you can set a list of workers IP/address by specifying the
addresses with `LLAMACPP_GRPC_SERVERS`:

```bash
LLAMACPP_GRPC_SERVERS="address1:port,address2:port" local-ai run
```

At this point the workload hitting in the LocalAI server should be
distributed across the nodes!

##### 🤖 Peer2Peer llama.cpp

LocalAI is the **first** AI Free, Open source project offering complete,
decentralized, peer2peer while private, LLM inferencing on top of the
libp2p protocol. There is no "public swarm" to offload the computation,
but rather empowers you to build your own cluster of local and remote
machines to distribute LLM computation.


![](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExZTdrZW9rc3hrMWxoZTV1OGo0ajF3d2MwMHFmeXVoMThqOGg1eHR4ZCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/q0KrtRcr10Bhu/giphy.gif)

This feature leverages the ability of llama.cpp to distribute the
workload explained just above and features from one of my other
projects, https://github.com/mudler/edgevpn.

LocalAI builds on top of the twos, and allows to create a private
peer2peer network between nodes, without the need of centralizing
connections or manually configuring IP addresses: it unlocks totally
decentralized, private, peer-to-peer inferencing capabilities. Works
also behind different NAT-ted networks (uses DHT and mDNS as discovery
mechanism).

**How it works:** A pre-shared token can be generated and shared between
workers and the server to form a private, decentralized, p2p network.

You can see the feature in action here:


![output](https://github.com/mudler/LocalAI/assets/2420543/8ca277cf-c208-4562-8929-808b2324b584)

##### How to use it

1.  Start the server with `--p2p`:

```bash
./local-ai run --p2p

##### 1:02AM INF loading environment variables from file envFile=.env
##### 1:02AM INF Setting logging to info

##### 1:02AM INF P2P mode enabled
##### 1:02AM INF No token provided, generating one

##### 1:02AM INF Generated Token:
##### XXXXXXXXXXX

##### 1:02AM INF Press a button to proceed
```

A token is displayed, copy it and press enter.

You can re-use the same token later restarting the server with
`--p2ptoken` (or `P2P_TOKEN`).

2. Start the workers. Now you can copy the local-ai binary in other
hosts, and run as many workers with that token:

```bash
TOKEN=XXX ./local-ai  p2p-llama-cpp-rpc

##### 1:06AM INF loading environment variables from file envFile=.env
##### 1:06AM INF Setting logging to info

##### {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:288","message":"connmanager disabled\n"}
##### {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:295","message":" go-libp2p resource manager protection enabled"}

##### {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:409","message":"max connections: 100\n"}
##### 1:06AM INF Starting llama-cpp-rpc-server on '127.0.0.1:34371'

##### {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
##### create_backend: using CPU backend

##### Starting RPC server on 127.0.0.1:34371, backend memory: 31913 MB
##### 2024/05/19 01:06:01 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). # See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.

##### {"level":"INFO","time":"2024-05-19T01:06:01.805+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWJ7WQAbCWKfJgjw2oMMGGss9diw3Sov5hVWi8t4DMgx92"}
##### {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44931 /ip4/127.0.0.1/udp/33251/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip4/127.0.0.1/udp/35660/quic-v1 /ip4/192.168.68.110/tcp/44931 /ip4/192.168.68.110/udp/33251/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip4/192.168.68.110/udp/35660/quic-v1 /ip6/::1/tcp/41289 /ip6/::1/udp/33160/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip6/::1/udp/35701/quic-v1]"}

##### {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
```

(Note you can also supply the token via args)

At this point, you should see in the server logs messages stating that
new workers are found

3. Now you can start doing inference as usual on the server (the node
used on step 1)

Interested in to try it out? As we are still updating the documentation,
you can read the full instructions here
[https://github.com/mudler/LocalAI/pull/2343](https://github.com/mudler/LocalAI/pull/2343)

##### 📜 Advanced Function calling support with Mixed JSON Grammars

LocalAI gets better at function calling with mixed grammars!

With this release, LocalAI introduces a transformative capability:
support for mixed JSON BNF grammars. It allows to specify a grammar for
the LLM that allows to output structured JSON and free text.

**How to use it:**

To enable mixed grammars, you can set in the `YAML` configuration file
`function.mixed_mode = true`, for example:

```yaml
  function:

##### disable injecting the "answer" tool
    disable_no_action: true

    grammar:

##### This allows the grammar to also return messages
      mixed_mode: true
```

This feature significantly enhances LocalAI's ability to interpret and
manipulate JSON data coming from the LLM through a more flexible and
powerful grammar system. Users can now combine multiple grammar types
within a single JSON structure, allowing for dynamic parsing and
validation scenarios.

Grammars can also turned off entirely and leave the user to determine
how the data is parsed from the LLM to be correctly interpretated by
LocalAI to be still compliant to the OpenAI REST spec.

For example, to interpret Hermes results, one can just annotate regexes
in `function.json_regex_match` to extract the LLM response:

```yaml
  function:
    grammar:
      disable: true

##### disable injecting the "answer" tool
    disable_no_action: true
    return_name_in_function_response: true

    json_regex_match:
    - "(?s)<tool_call>(.*?)</tool_call>"
    - "(?s)<tool_call>(.*?)"
  
    replace_llm_results:

##### Drop the scratchpad content from responses
    - key: "(?s)<scratchpad>.*</scratchpad>"
      value: ""
    replace_function_results:

##### Replace everything that is not JSON array or object, just in case.
    - key: '(?s)^[^{\[]*'
      value: ""
    - key: '(?s)[^}\]]*$'
      value: ""

##### Drop the scratchpad content from responses
    - key: "(?s)<scratchpad>.*</scratchpad>"
      value: ""
```

Note that regex can still be used when enabling mixed grammars is
enabled.

This is especially important for models which does not support grammars
- such as transformers or OpenVINO models, that now can support as well
function calling. As we update the docs, further documentation can be
found in the PRs that you can find in the changelog below.

##### 🚀 New Model Additions and Updates


![local-ai-yi-updates](https://github.com/mudler/LocalAI/assets/2420543/5d646703-0c64-4299-b551-a39074f63c2d)

Our model gallery continues to grow with exciting new additions like
Aya-35b, Mistral-0.3, Hermes-Theta and updates to existing models
ensuring they remain at the cutting edge.

This release is having major enhancements on tool calling support.
Besides working on making our default models in AIO images more
performant - now you can try an enhanced out-of-the-box experience with
function calling in the Hermes model family (
[Hermes-2-Pro-Mistral](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF)
and
[Hermes-2-Theta-Llama-3](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF))

##### Our LocalAI function model!


![local-ai-functioncall-model](https://github.com/mudler/LocalAI/assets/2420543/b2955459-49b6-4a57-96e8-242966ccef12)

I have fine-tuned a function call model specific to leverage entirely
the grammar support of LocalAI, you can find it in the model gallery
already and on
[huggingface](https://huggingface.co/mudler/LocalAI-Llama3-8b-Function-Call-v0.2)

##### 🔄 Single Binary Release: Simplified Deployment and Management

In our continuous effort to streamline the user experience and
deployment process, LocalAI v2.16.0 proudly introduces a single binary
release. This enhancement, thanks to
[@&#8203;sozercan](https://github.com/sozercan)'s contributions,
consolidates all variants (CUDA and non-cuda releases) and dependencies
into one compact executable file.

This change simplifies the installation and update processes, reduces
compatibility issues, and speeds up the setup for new users and existing
deployments as now binary releases are even more portable than ever!

##### 🔧 Bug Fixes and Improvements

A host of bug fixes have been implemented to ensure smoother operation
and integration. Key fixes include enhancements to the Intel build
process, stability adjustments for setuptools in Python backends, and
critical updates ensuring the successful build of p2p configurations.

##### Migrating Python Backends: From Conda to UV

LocalAI has migrated its Python backends from Conda to UV. This
transition, thanks to [@&#8203;cryptk](https://github.com/cryptk)
contributions, enhances the efficiency and scalability of our backend
operations. Users will experience faster setup times and reduced
complexity, streamlining the development process and making it easier to
manage dependencies across different environments.

##### 📣 Let's Make Some Noise!

A gigantic THANK YOU to everyone who’s contributed—your feedback, bug
squashing, and feature suggestions are what make LocalAI shine. To all
our heroes out there supporting other users and sharing their expertise,
you’re the real MVPs!

Remember, LocalAI thrives on community support—not big corporate bucks.
If you love what we're building, show some love! A shoutout on social
(@&#8203;LocalAI_OSS and @&#8203;mudler_it on twitter/X), joining our
sponsors, or simply starring us on GitHub makes all the difference.

Also, if you haven't yet joined our Discord, come on over! Here's the
link: https://discord.gg/uJAeKSAGDy

Thanks a ton, and.. enjoy this release!

##### What's Changed

##### Bug fixes 🐛

- build: do not specify a BUILD_ID by default by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2284](https://github.com/mudler/LocalAI/pull/2284)
- fix: add missing openvino/optimum/etc libraries for Intel, fixes
[#&#8203;2289](https://github.com/mudler/LocalAI/issues/2289) by
[@&#8203;cryptk](https://github.com/cryptk) in
[https://github.com/mudler/LocalAI/pull/2292](https://github.com/mudler/LocalAI/pull/2292)
- add setuptools for openvino by
[@&#8203;fakezeta](https://github.com/fakezeta) in
[https://github.com/mudler/LocalAI/pull/2301](https://github.com/mudler/LocalAI/pull/2301)
- fix: add setuptools to all requirements-intel.txt files for python
backends by [@&#8203;cryptk](https://github.com/cryptk) in
[https://github.com/mudler/LocalAI/pull/2333](https://github.com/mudler/LocalAI/pull/2333)
- ci: correctly build p2p in GO_TAGS by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2369](https://github.com/mudler/LocalAI/pull/2369)
- ci: generate specific image for intel builds by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2374](https://github.com/mudler/LocalAI/pull/2374)
- fix: stablediffusion binary by
[@&#8203;sozercan](https://github.com/sozercan) in
[https://github.com/mudler/LocalAI/pull/2385](https://github.com/mudler/LocalAI/pull/2385)

##### Exciting New Features 🎉

- feat: migrate python backends from conda to uv by
[@&#8203;cryptk](https://github.com/cryptk) in
[https://github.com/mudler/LocalAI/pull/2215](https://github.com/mudler/LocalAI/pull/2215)
- feat: create bash library to handle install/run/test of python
backends by [@&#8203;cryptk](https://github.com/cryptk) in
[https://github.com/mudler/LocalAI/pull/2286](https://github.com/mudler/LocalAI/pull/2286)
- feat(grammar): support models with specific construct by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2291](https://github.com/mudler/LocalAI/pull/2291)
- feat(ui): display number of available models for installation by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2298](https://github.com/mudler/LocalAI/pull/2298)
- feat: auto select llama-cpp cpu variant by
[@&#8203;sozercan](https://github.com/sozercan) in
[https://github.com/mudler/LocalAI/pull/2305](https://github.com/mudler/LocalAI/pull/2305)
- feat(llama.cpp): add `flash_attention` and `no_kv_offloading` by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2310](https://github.com/mudler/LocalAI/pull/2310)
- feat(functions): support models with no grammar and no regex by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2315](https://github.com/mudler/LocalAI/pull/2315)
- feat(functions): allow to set JSON matcher by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2319](https://github.com/mudler/LocalAI/pull/2319)
- feat: auto select llama-cpp cuda runtime by
[@&#8203;sozercan](https://github.com/sozercan) in
[https://github.com/mudler/LocalAI/pull/2306](https://github.com/mudler/LocalAI/pull/2306)
- feat(llama.cpp): add distributed llama.cpp inferencing by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2324](https://github.com/mudler/LocalAI/pull/2324)
- feat(functions): mixed JSON BNF grammars by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2328](https://github.com/mudler/LocalAI/pull/2328)
- feat(functions): simplify parsing, read functions as list by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2340](https://github.com/mudler/LocalAI/pull/2340)
- feat(functions): Enable true regex replacement for the
regexReplacement option by
[@&#8203;lenaxia](https://github.com/lenaxia) in
[https://github.com/mudler/LocalAI/pull/2341](https://github.com/mudler/LocalAI/pull/2341)
- feat(backends): add openvoice backend by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2334](https://github.com/mudler/LocalAI/pull/2334)
- feat(webui): statically embed js/css assets by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2348](https://github.com/mudler/LocalAI/pull/2348)
- feat(functions): allow to use JSONRegexMatch unconditionally by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2349](https://github.com/mudler/LocalAI/pull/2349)
- feat(functions): don't use yaml.MapSlice by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2354](https://github.com/mudler/LocalAI/pull/2354)
- build: add sha by [@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2356](https://github.com/mudler/LocalAI/pull/2356)
- feat(llama.cpp): Totally decentralized, private, distributed, p2p
inference by [@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2343](https://github.com/mudler/LocalAI/pull/2343)
- feat(functions): relax mixedgrammars by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2365](https://github.com/mudler/LocalAI/pull/2365)
- models(gallery): add mistral-0.3 and command-r, update functions by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2388](https://github.com/mudler/LocalAI/pull/2388)

##### 🧠 Models

- models(gallery): add aloe by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2283](https://github.com/mudler/LocalAI/pull/2283)
- models(gallery): add Llama-3-8B-Instruct-abliterated by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2288](https://github.com/mudler/LocalAI/pull/2288)
- models(gallery): add l3-chaoticsoliloquy-v1.5-4x8b by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2295](https://github.com/mudler/LocalAI/pull/2295)
- models(gallery): add jsl-medllama-3-8b-v2.0 by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2296](https://github.com/mudler/LocalAI/pull/2296)
- models(gallery): add llama-3-refueled by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2297](https://github.com/mudler/LocalAI/pull/2297)
- models(gallery): add aura-llama-Abliterated by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2309](https://github.com/mudler/LocalAI/pull/2309)
- models(gallery): add Bunny-llama by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2311](https://github.com/mudler/LocalAI/pull/2311)
- models(gallery): add lumimaidv2 by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2312](https://github.com/mudler/LocalAI/pull/2312)
- models(gallery): add orthocopter by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2313](https://github.com/mudler/LocalAI/pull/2313)
- fix(gallery) Correct llama3-8b-instruct model file by
[@&#8203;tannisroot](https://github.com/tannisroot) in
[https://github.com/mudler/LocalAI/pull/2330](https://github.com/mudler/LocalAI/pull/2330)
- models(gallery): add hermes-2-theta-llama-3-8b by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2331](https://github.com/mudler/LocalAI/pull/2331)
- models(gallery): add yi 6/9b, sqlcoder, sfr-iterative-dpo by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2335](https://github.com/mudler/LocalAI/pull/2335)
- models(gallery): add anita by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2344](https://github.com/mudler/LocalAI/pull/2344)
- models(gallery): add master-yi by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2345](https://github.com/mudler/LocalAI/pull/2345)
- models(gallery): update poppy porpoise mmproj by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2346](https://github.com/mudler/LocalAI/pull/2346)
- models(gallery): add LocalAI-Llama3-8b-Function-Call-v0.2-GGUF by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2355](https://github.com/mudler/LocalAI/pull/2355)
- models(gallery): add stheno by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2358](https://github.com/mudler/LocalAI/pull/2358)
- fix(gallery): checksum Meta-Llama-3-70B-Instruct.Q4\_K_M.gguf -
[#&#8203;2364](https://github.com/mudler/LocalAI/issues/2364) by
[@&#8203;Nold360](https://github.com/Nold360) in
[https://github.com/mudler/LocalAI/pull/2366](https://github.com/mudler/LocalAI/pull/2366)
- models(gallery): add phi-3-medium-4k-instruct by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2367](https://github.com/mudler/LocalAI/pull/2367)
- models(gallery): add hercules and helpingAI by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2376](https://github.com/mudler/LocalAI/pull/2376)
- ci(checksum_checker): do get sha from hf API when available by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2380](https://github.com/mudler/LocalAI/pull/2380)
- models(gallery): ⬆️ update checksum by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2383](https://github.com/mudler/LocalAI/pull/2383)
- models(gallery): ⬆️ update checksum by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2386](https://github.com/mudler/LocalAI/pull/2386)
- models(gallery): add aya-35b by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2391](https://github.com/mudler/LocalAI/pull/2391)

##### 📖 Documentation and examples

- docs: Update semantic-todo/README.md by
[@&#8203;eltociear](https://github.com/eltociear) in
[https://github.com/mudler/LocalAI/pull/2294](https://github.com/mudler/LocalAI/pull/2294)
- Add Home Assistant Integration by
[@&#8203;valentinfrlch](https://github.com/valentinfrlch) in
[https://github.com/mudler/LocalAI/pull/2387](https://github.com/mudler/LocalAI/pull/2387)
- Add warning for running the binary on MacOS by
[@&#8203;mauromorales](https://github.com/mauromorales) in
[https://github.com/mudler/LocalAI/pull/2389](https://github.com/mudler/LocalAI/pull/2389)

##### 👒 Dependencies

- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2281](https://github.com/mudler/LocalAI/pull/2281)
- ⬆️ Update docs version mudler/LocalAI by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2280](https://github.com/mudler/LocalAI/pull/2280)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2285](https://github.com/mudler/LocalAI/pull/2285)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2290](https://github.com/mudler/LocalAI/pull/2290)
- feat(swagger): update swagger by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2302](https://github.com/mudler/LocalAI/pull/2302)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2303](https://github.com/mudler/LocalAI/pull/2303)
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2317](https://github.com/mudler/LocalAI/pull/2317)
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2326](https://github.com/mudler/LocalAI/pull/2326)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2316](https://github.com/mudler/LocalAI/pull/2316)
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2329](https://github.com/mudler/LocalAI/pull/2329)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2337](https://github.com/mudler/LocalAI/pull/2337)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2339](https://github.com/mudler/LocalAI/pull/2339)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2342](https://github.com/mudler/LocalAI/pull/2342)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2351](https://github.com/mudler/LocalAI/pull/2351)
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2352](https://github.com/mudler/LocalAI/pull/2352)
- dependencies(grpcio): bump to fix CI issues by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2362](https://github.com/mudler/LocalAI/pull/2362)
- deps(llama.cpp): update and adapt API changes by
[@&#8203;mudler](https://github.com/mudler) in
[https://github.com/mudler/LocalAI/pull/2381](https://github.com/mudler/LocalAI/pull/2381)
- ⬆️ Update ggerganov/whisper.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2361](https://github.com/mudler/LocalAI/pull/2361)
- ⬆️ Update go-skynet/go-bert.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/1225](https://github.com/mudler/LocalAI/pull/1225)
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://github.com/localai-bot) in
[https://github.com/mudler/LocalAI/pull/2360](https://github.com/mudler/LocalAI/pull/2360)

##### Other Changes

- refactor: Minor improvements to BackendConfigLoader by
[@&#8203;dave-gray101](https://github.com/dave-gray101) in
[https://github.com/mudler/LocalAI/pull/2353](https://github.com/mudler/LocalAI/pull/2353)

##### New Contributors

- [@&#8203;tannisroot](https://github.com/tannisroot) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/2330](https://github.com/mudler/LocalAI/pull/2330)
- [@&#8203;lenaxia](https://github.com/lenaxia) made their first
contribution in
[https://github.com/mudler/LocalAI/pull/2341](https://github.com/mudler/LocalAI/pull/2341)
- [@&#8203;valentinfrlch](https://github.com/valentinfrlch) made their
first contribution in
[https://github.com/mudler/LocalAI/pull/2387](https://github.com/mudler/LocalAI/pull/2387)

**Full Changelog**:
mudler/LocalAI@v2.15.0...v2.16.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzcuNCIsInVwZGF0ZWRJblZlciI6IjM3LjM3Ny40IiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIiwibGFiZWxzIjpbImF1dG9tZXJnZSIsInVwZGF0ZS9kb2NrZXIvZ2VuZXJhbC9ub24tbWFqb3IiXX0=-->
@fakezeta
Copy link
Collaborator

Hi @mudler,

sorry for the late reply but it's quite a busy period.
I tested with Hermes-2-Theta-Llama-3-8B in the transformer backend with grammar enabled and I can confirm that function calling is working.

Great!!

@interstellarninja
Copy link

interstellarninja commented Jun 26, 2024

@mudler

we were trying to implement Hermes format with llama-cpp-python using a chatml-function-calling template, a while back we did chime in and it was still a WIP

today i just tried Hermes-Theta-8B using the same template and it just seems to work. there's still slight differences in the tool-call format which i think could be fixed

i'm sharing their template just in case it could be useful for LocalAI config.

https://github.com/abetlen/llama-cpp-python/blob/04959f1884c8ef93bd5a4aa40ff0accb8438c0c1/llama_cpp/llama_chat_format.py#L3165

Here's the function calling template they are using:

function_calling_template = (
        "{% for message in messages %}"
        "<|im_start|>{{ message.role }}\n"
        # System message
        "{% if message.role == 'system' %}"
        "{{ message.content }}"
        "{% if tool_calls %}"
        "\n\nYou have access to the following functions:\n"
        "{% for tool in tools %}"
        "\nfunctions.{{ tool.function.name }}:\n"
        "{{ tool.function.parameters | tojson }}"
        "\n{% endfor %}"
        "\n\nYou can respond to users messages with either a single message or one or more function calls."
        "\n\nTo respond with a message begin the message with 'message:', use the following format:"
        "\n\nmessage:"
        "\n<message>"
        "\n\nTo respond with one or more function calls begin the message with 'functions.<function_name>:', use the following format:"
        "\n\nfunctions.<function_name>:"
        '\n{ "arg1": "value1", "arg2": "value2" }'
        "\nfunctions.<function_name>:"
        '\n{ "arg1": "value1", "arg2": "value2" }'
        "{% endif %}"
        "<|im_end|>\n"
        "{% endif %}"
        # User message
        "{% if message.role == 'user' %}"
        "{{ message.content }}"
        "<|im_end|>\n"
        "{% endif %}"
        # Assistant message
        "{% if message.role == 'assistant' %}"
        ## Reglar message
        "{% if message.content and message.content | length > 0 %}"
        "{% if tool_calls %}"
        "message:\n"
        "{% endif %}"
        "{{ message.content }}"
        "<|im_end|>\n"
        "{% endif %}"
        ## Function calls
        "{% if 'tool_calls' in message %}"
        "{% for tool_call in message.tool_calls %}"
        "functions.{{ tool_call.function.name }}:\n"
        "{{ tool_call.function.arguments }}"
        "{% endfor %}"
        "<|im_end|>\n"
        "{% endif %}"
        "{% endif %}"
        "{% endfor %}"
        "{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
    )

Please check the example implementation of Hermes function calling with llama.cpp here:
https://github.com/NousResearch/Hermes-Function-Calling/blob/main/examples/lllama-cpp-multiple-fn.ipynb

you can try it on colab here:
https://colab.research.google.com/drive/10UPSepK_cp-pvJLChW8GihM-ETJaDn1l#scrollTo=TPJ-J9TDeYIw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants