add support for tools for the ollama provider #662

humcqc · 2024-06-10T15:40:39Z

Proposition for : #305, tested on llama3, does not work yet with other models.
Draft to discuss the proposition.
Based on experimental python and discussion here

It's way to have tools working untill ollama fix will be available.

To discuss if we want this in langchain or quarkus-langchain or both.

@jmartisk @langchain4j @geoand WDYT ?

geoand

This is really interesting.

Just so it's clear - does this work with the latest Ollama version or do we still need to wait for that feature to land?

...a/deployment/src/main/java/io/quarkiverse/langchain4j/ollama/deployment/OllamaProcessor.java

geoand · 2024-06-10T16:37:03Z

.../ollama/deployment/src/test/java/io/quarkiverse/langchain4j/ollama/deployment/ToolsTest.java

+import io.quarkus.test.QuarkusUnitTest;
+
+@Disabled("Integration tests that need an ollama server running")
+public class ToolsTest {


We generally don't write such tests, but instead use Wiremock (see the OpenAI module for tools related tests)

...llama/runtime/src/main/java/io/quarkiverse/langchain4j/ollama/OllamaDefaultToolsHandler.java

geoand · 2024-06-10T16:39:00Z

...llama/runtime/src/main/java/io/quarkiverse/langchain4j/ollama/OllamaDefaultToolsHandler.java

+        return toolSpecifications.stream()
+                .filter(ts -> ts.name().equals(toolResponse.tool))
+                .map(ts -> toToolExecutionRequest(toolResponse, ts))
+                .toList();


We generally try hard to avoid lambdas in Quarkus code

why (just curious)?

When Quarkus started, the team found that the lambdas had a small (but not zero) impact on memory usage.

Mind you, this on Java 8, so things may have changed substantially since then, but we still try to avoid them unless the alternative is just plain terrible.

...llama/runtime/src/main/java/io/quarkiverse/langchain4j/ollama/OllamaDefaultToolsHandler.java

humcqc · 2024-06-10T16:47:20Z

Just so it's clear - does this work with the latest Ollama version or do we still need to wait for that feature to land?

Yes it works with the latest Ollama version.

geoand · 2024-06-10T16:48:58Z

Very nice, I'll give it a try tomorrow

geoand · 2024-06-11T06:26:51Z

This is super interesting, but unfortunately it does not work properly :(.

The issue seems to be that Ollama does not understand that the tool has been executed and keeps telling us to re-execute it.
Here is a sample interaction using the email-me-a-poem sample:

1st request:

2024-06-11 09:20:26,952 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Request:
- method: POST
- url: http://localhost:11434/api/chat
- headers: [Accept: application/json], [Content-Type: application/json], [User-Agent: Resteasy Reactive Client], [content-length: 1706]
- body: {
  "model" : "llama3",
  "messages" : [ {
    "role" : "SYSTEM",
    "content" : "You are a professional poet\nYou have access to the following tools:\n\n[ {\n  \"name\" : \"sendAnEmail\",\n  \"description\" : \"send the given content by email\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"content\" : {\n        \"type\" : \"string\"\n      }\n    },\n    \"required\" : [ \"content\" ]\n  }\n}, {\n  \"name\" : \"__conversational_response\",\n  \"description\" : \"Respond conversationally if no other tools should be called for a given query and history.\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"reponse\" : {\n        \"type\" : \"string\",\n        \"description\" : \"Conversational response to the user.\"\n      }\n    },\n    \"required\" : [ \"response\" ]\n  }\n} ]\n\nYou must always select one of the above tools and respond with a JSON object matching the following schema,\nand only this json object:\n{\n  \"tool\": <name of the selected tool>,\n  \"tool_input\": <parameters for the selected tool, matching the tool's JSON schema>\n}\nDo not use other tools than the ones from the list above. Always provide the \"tool_input\" field.\nIf several tools are necessary, answer them sequentially.\n\nWhen the user provides sufficient information, answer with the __conversational_response tool.\n"
  }, {
    "role" : "USER",
    "content" : "Write a poem about Quarkus. The poem should be 4 lines long.\nThen send this poem by email. Your response should include the poem.\n"
  } ],
  "options" : {
    "temperature" : 0.8,
    "top_k" : 40,
    "top_p" : 0.9
  },
  "format" : "json",
  "stream" : false
}

1st response:

2024-06-11 09:20:27,939 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Response:
- status code: 200
- headers: [Content-Type: application/json; charset=utf-8], [Date: Tue, 11 Jun 2024 06:20:27 GMT], [Content-Length: 537]
- body: {"model":"llama3","created_at":"2024-06-11T06:20:27.933221683Z","message":{"role":"assistant","content":"{ \"tool\": \"sendAnEmail\", \"tool_input\": \n  { \"content\": \n    \"In Quarkus, where Java flows free,\nA stream of innovation, for you and me.\nWith microservices, it's a world to see,\nA new way to code, wild and carefree.\" } }\n\n\n\n  \n "},"done_reason":"stop","done":true,"total_duration":980876099,"load_duration":913527,"prompt_eval_count":224,"prompt_eval_duration":127446000,"eval_count":70,"eval_duration":718718000}

After this the extension properly executed the tool:

2024-06-11 09:20:27,993 INFO  [io.qua.lan.sam.EmailService] (executor-thread-1) Sending an email

Then the following is sent to Ollama:

2024-06-11 09:20:28,002 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Request:
- method: POST
- url: http://localhost:11434/api/chat
- headers: [Accept: application/json], [Content-Type: application/json], [User-Agent: Resteasy Reactive Client], [content-length: 1792]
- body: {
  "model" : "llama3",
  "messages" : [ {
    "role" : "SYSTEM",
    "content" : "You are a professional poet\nYou have access to the following tools:\n\n[ {\n  \"name\" : \"sendAnEmail\",\n  \"description\" : \"send the given content by email\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"content\" : {\n        \"type\" : \"string\"\n      }\n    },\n    \"required\" : [ \"content\" ]\n  }\n}, {\n  \"name\" : \"__conversational_response\",\n  \"description\" : \"Respond conversationally if no other tools should be called for a given query and history.\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"reponse\" : {\n        \"type\" : \"string\",\n        \"description\" : \"Conversational response to the user.\"\n      }\n    },\n    \"required\" : [ \"response\" ]\n  }\n} ]\n\nYou must always select one of the above tools and respond with a JSON object matching the following schema,\nand only this json object:\n{\n  \"tool\": <name of the selected tool>,\n  \"tool_input\": <parameters for the selected tool, matching the tool's JSON schema>\n}\nDo not use other tools than the ones from the list above. Always provide the \"tool_input\" field.\nIf several tools are necessary, answer them sequentially.\n\nWhen the user provides sufficient information, answer with the __conversational_response tool.\n"
  }, {
    "role" : "USER",
    "content" : "Write a poem about Quarkus. The poem should be 4 lines long.\nThen send this poem by email. Your response should include the poem.\n"
  }, {
    "role" : "ASSISTANT"
  }, {
    "role" : "USER",
    "content" : "Success"
  } ],
  "options" : {
    "temperature" : 0.8,
    "top_k" : 40,
    "top_p" : 0.9
  },
  "format" : "json",
  "stream" : false
}

The response however is now problematic:

2024-06-11 09:20:28,888 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Response:
- status code: 200
- headers: [Content-Type: application/json; charset=utf-8], [Date: Tue, 11 Jun 2024 06:20:28 GMT], [Content-Length: 548]
- body: {"model":"llama3","created_at":"2024-06-11T06:20:28.887595616Z","message":{"role":"assistant","content":"{ \"tool\": \"sendAnEmail\", \"tool_input\": { \"content\": \"Quarkus, a framework so fine,\nBuilt for Java, with Quarkus divine.\nIt brings us power, and speed to our code,\nAnd makes our apps shine like a star in the road.\n\nBest regards, [Your Name]\" } }"},"done_reason":"stop","done":true,"total_duration":885540061,"load_duration":1477410,"prompt_eval_count":12,"prompt_eval_duration":68617000,"eval_count":70,"eval_duration":677705000}

As you can see it tells us to execute the tool again... This keep on happening without the sequence ending from the Ollama side.

jmartisk · 2024-06-11T06:34:07Z

.../src/main/java/io/quarkiverse/langchain4j/ollama/runtime/config/LangChain4jOllamaConfig.java

+         * Whether to enable the experimental tools
+         */
+        @WithDefault("false")
+        Optional<Boolean> experimentalTools();


Shouldn't we rather just name it tools and mark it as experimental in a comment? To avoid having to do a breaking change once we don't consider it experimental anymore...

I would name it enableTools

I think this will stay experimental till ollama implements tools feature.

humcqc · 2024-06-11T07:26:36Z

As you can see it tells us to execute the tool again... This keep on happening without the sequence ending from the Ollama side.

Yes the issue with this approach is that the llm should be aware the tool has been executed. We can have a simplified approach where we just trigger one tool without recursivity. OR the tools should always answer with a status for llm.

I will try to add some example with the sendPoem.

…avoid unnecessary multiple calls to same tool

humcqc · 2024-06-12T19:41:00Z

I've updated the prompt + added some tools history in user messages. But didn't find the good way to avoid selecting twice the same tool. Perhaps @langchain4j @jmartisk or @geoand you can help here ?

geoand · 2024-06-13T03:27:52Z

By selecting twice, do you mean the tool gets executed twice?

humcqc · 2024-06-13T07:02:51Z

By selecting twice, do you mean the tool gets executed twice?

yes

geoand · 2024-06-13T07:13:36Z

We can't really do much here, the LLM is supposed to decide which tools need to get executed as complex workflows may need to have multiple tool invocations. OpenAI handles this seamlessly.

humcqc · 2024-06-13T07:16:20Z

We can't really do much here, the LLM is supposed to decide which tools need to get executed as complex workflows may need to have multiple tool invocations. OpenAI handles this seamlessly.

yes, but it's weird that this one https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-4cad3d1a7b72dca01c9cf8f6019dfdc9c8949b729fdafe2cbda381631db6f88bR34 seems to work correctly and it is more complex that the send Poem.

I think I'm missing the correct inputs/prompt to tell LLM that the action has been executed.

geoand · 2024-06-13T07:32:39Z

In that case, I would turn on logging of requests and responses and compare the one that works with the one that does not.

geoand · 2024-06-13T07:33:17Z

By the way, I want to clarify if that we can get this to work properly, it's a no brainer for inclusion :)

humcqc · 2024-06-15T20:33:09Z

New approach: ask llm to create list of tools to execute and then respond with previous result.
Seems to work with llama3 , not yet with other models.
Tests in https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-d06a2b262b5211fac51ddebbe50152fd0ea4e93e0ee0ff5f6e764eb5d649827c

Needs some modification in core : https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-2dd3bec40934ad6d175f6f14dad1af0e11c234cf5fec69739a89460d472ab55b
I add some stuff to use previous tools as input of next tools and response.

I need to check OpenAI broken tests but they don't work on my side.

Still in progress, but the main part could be done in langchain4j and then used in ollama models from langchain and quarkus-langchain.

humcqc · 2024-06-16T19:12:05Z

https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-2dd3bec40934ad6d175f6f14dad1af0e11c234cf5fec69739a89460d472ab55bR235

In order to replace some AI response containing variables with function results I've changed the order of chat memory.
With my changes we add function result and then AI response.
But in the tests you are expecting AI response first and then function results.

WDYT ? Could I change the messages order in the tests ? Or should I keep the existing order and adapt the tools executor part.

- Tested on Llamma3 and gemma2

humcqc · 2024-07-15T20:45:43Z

Finally I've come back to the full implementation in quarkus-langchain4j, will discuss later with langchain4j if we want to port it.
Still in progress, but could already be tested as a experimental feature -> https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-79b70eb6ea261b73ffd9dd46e5d3beb3260d1d3b37319a49fe5e37fe2278d401

Still need to:

make it work with more models (mistral, qwen2, phi3)
find a better way to enable it.
Perhaps will remove ExperimentalSequentialToolsDelegate as it is less efficient than ExperimentalParallelToolsDelegate.

But if someone could test and give me some feedbacks to finalize, it could be cool.

- cleanup parameters - doc

geoand · 2024-07-16T06:25:20Z

But if someone could test and give me some feedbacks to finalize, it could be cool.

I will try it soon

core/runtime/src/main/java/io/quarkiverse/langchain4j/data/AiStatsMessage.java

.../java/io/quarkiverse/langchain4j/runtime/aiservice/AiServiceMethodImplementationSupport.java

core/runtime/src/main/java/io/quarkiverse/langchain4j/runtime/aiservice/ToolsResultMemory.java

...ders/ollama/runtime/src/main/java/io/quarkiverse/langchain4j/ollama/OllamaMessagesUtils.java

...ders/ollama/runtime/src/main/java/io/quarkiverse/langchain4j/ollama/QuarkusOllamaClient.java

.../src/main/java/io/quarkiverse/langchain4j/ollama/runtime/config/LangChain4jOllamaConfig.java

.../runtime/src/main/java/io/quarkiverse/langchain4j/ollama/tool/ExperimentalMessagesUtils.java

geoand · 2024-07-16T12:41:55Z

Thanks, I'll have another look tomorrow.

One thing I can say right now is that when everything is ready, we will need the commits squashed and the PR rebased onto the latest main (if it's not already)

humcqc · 2024-07-16T12:46:43Z

Thanks, I'll have another look tomorrow.

One thing I can say right now is that when everything is ready, we will need the commits squashed and the PR rebased onto the latest main (if it's not already)

Sure, this is just the draft PR of a pretty complex feature, i'm still thinking on some enhancements and perhaps another way to register it like a dedicated ChatLanguageModel instead of an option. Or even else a specific extension. When finalized i will create a new PR based on latest Main with clean git log.

I'm already using it in my project to see if it covers my uses cases and seems ok. But It should be tested through other use cases.

…ableHandler

humcqc · 2024-07-24T15:07:34Z

Hi @geoand ,
Seems ollama tools are close to be ready.
In that case, I could use my experimental model in my own project if you think it's too experimental :) it allow to have the answer in one call to llm instead of multiple calls.

But I would need the changes in this PR of the core part for my Model to work.

WDYT ?

geoand · 2024-07-25T05:33:31Z

If Ollama is close to releasing official support for tools, it's probably best to wait

geoand · 2024-07-26T08:48:13Z

@humcqc Ollama 0.3.0 was released and contains tools support!

#783 is the change that is needed to bring it in. Compared to this change, it's much simpler so I hope you don't mind if we close this PR in favor of the other one.
I would however like to thank you very much for your work on this!

humcqc · 2024-07-26T09:23:18Z

No pb! It was just waiting for official tool. There is some stuff to keep like variables and the fact that in one llm request we have the tools calls and the result. I would like to benchmark and then discuss with ollama if it's something that could be useful. Thanks for 783 Le ven. 26 juil. 2024, 10:48, Georgios Andrianakis ***@***.***> a écrit :

…

@humcqc <https://github.com/humcqc> Ollama 0.3.0 was released and contains tools support! #783 <#783> is the change that is needed to bring it in. Compared to this change, it's much simpler so I hope you don't mind if we close this PR in favor of the other one. I would however like to thank you very much for your work on this! — Reply to this email directly, view it on GitHub <#662 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A477YITGYCE72EWUKO7HNQTZOIEOHAVCNFSM6AAAAABJCSHS2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGI3DQOJXGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

geoand · 2024-07-26T09:24:43Z

👍🏼

humcqc added 3 commits June 10, 2024 17:35

Proposition for : quarkiverse#305, tested on llama3

38261d6

Test class missing in previous commit

55037da

Fix typo

c3c4776

geoand reviewed Jun 10, 2024

View reviewed changes

humcqc added 2 commits June 10, 2024 19:52

Review 1

95e3d20

Fix experimentalTool default value

213c0af

jmartisk reviewed Jun 11, 2024

View reviewed changes

humcqc added 3 commits June 12, 2024 21:30

New Prompt, but still some issues with send poem. Need to tune it to …

e33b92d

…avoid unnecessary multiple calls to same tool

Fix Format

9f951e7

Fix Format 2

19f2157

Fix Test model

aba545d

humcqc added 4 commits June 15, 2024 14:32

New approach

9231922

small enhancement still need to fix get_Expenses

b4e1d04

Prompt Optimisation for Llama3

d06a779

fix format

51c8f39

Merge branch 'quarkiverse:main' into main

6f2b938

change tests order

3fe6b8d

humcqc and others added 4 commits July 15, 2024 18:27

Merge branch 'quarkiverse:main' into main

1768ec5

- Clean up revert and merge.

0424252

- Fix langchain4j undependency.

d96b632

- Fix langchain4j unwanted dependency 2

3aa3806

- Tested on Llamma3 and gemma2

- Fix native build

a6759c8

- cleanup parameters - doc