Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for tools for the ollama provider #662

Closed
wants to merge 33 commits into from

Conversation

humcqc
Copy link
Contributor

@humcqc humcqc commented Jun 10, 2024

Proposition for : #305, tested on llama3, does not work yet with other models.
Draft to discuss the proposition.
Based on experimental python and discussion here

It's way to have tools working untill ollama fix will be available.

To discuss if we want this in langchain or quarkus-langchain or both.

@jmartisk @langchain4j @geoand WDYT ?

Copy link
Collaborator

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting.

Just so it's clear - does this work with the latest Ollama version or do we still need to wait for that feature to land?

import io.quarkus.test.QuarkusUnitTest;

@Disabled("Integration tests that need an ollama server running")
public class ToolsTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't write such tests, but instead use Wiremock (see the OpenAI module for tools related tests)

Comment on lines 120 to 123
return toolSpecifications.stream()
.filter(ts -> ts.name().equals(toolResponse.tool))
.map(ts -> toToolExecutionRequest(toolResponse, ts))
.toList();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try hard to avoid lambdas in Quarkus code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why (just curious)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Quarkus started, the team found that the lambdas had a small (but not zero) impact on memory usage.

Mind you, this on Java 8, so things may have changed substantially since then, but we still try to avoid them unless the alternative is just plain terrible.

@humcqc
Copy link
Contributor Author

humcqc commented Jun 10, 2024

Just so it's clear - does this work with the latest Ollama version or do we still need to wait for that feature to land?

Yes it works with the latest Ollama version.

@geoand
Copy link
Collaborator

geoand commented Jun 10, 2024

Very nice, I'll give it a try tomorrow

@geoand
Copy link
Collaborator

geoand commented Jun 11, 2024

This is super interesting, but unfortunately it does not work properly :(.

The issue seems to be that Ollama does not understand that the tool has been executed and keeps telling us to re-execute it.
Here is a sample interaction using the email-me-a-poem sample:

1st request:

2024-06-11 09:20:26,952 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Request:
- method: POST
- url: http://localhost:11434/api/chat
- headers: [Accept: application/json], [Content-Type: application/json], [User-Agent: Resteasy Reactive Client], [content-length: 1706]
- body: {
  "model" : "llama3",
  "messages" : [ {
    "role" : "SYSTEM",
    "content" : "You are a professional poet\nYou have access to the following tools:\n\n[ {\n  \"name\" : \"sendAnEmail\",\n  \"description\" : \"send the given content by email\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"content\" : {\n        \"type\" : \"string\"\n      }\n    },\n    \"required\" : [ \"content\" ]\n  }\n}, {\n  \"name\" : \"__conversational_response\",\n  \"description\" : \"Respond conversationally if no other tools should be called for a given query and history.\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"reponse\" : {\n        \"type\" : \"string\",\n        \"description\" : \"Conversational response to the user.\"\n      }\n    },\n    \"required\" : [ \"response\" ]\n  }\n} ]\n\nYou must always select one of the above tools and respond with a JSON object matching the following schema,\nand only this json object:\n{\n  \"tool\": <name of the selected tool>,\n  \"tool_input\": <parameters for the selected tool, matching the tool's JSON schema>\n}\nDo not use other tools than the ones from the list above. Always provide the \"tool_input\" field.\nIf several tools are necessary, answer them sequentially.\n\nWhen the user provides sufficient information, answer with the __conversational_response tool.\n"
  }, {
    "role" : "USER",
    "content" : "Write a poem about Quarkus. The poem should be 4 lines long.\nThen send this poem by email. Your response should include the poem.\n"
  } ],
  "options" : {
    "temperature" : 0.8,
    "top_k" : 40,
    "top_p" : 0.9
  },
  "format" : "json",
  "stream" : false
}

1st response:

2024-06-11 09:20:27,939 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Response:
- status code: 200
- headers: [Content-Type: application/json; charset=utf-8], [Date: Tue, 11 Jun 2024 06:20:27 GMT], [Content-Length: 537]
- body: {"model":"llama3","created_at":"2024-06-11T06:20:27.933221683Z","message":{"role":"assistant","content":"{ \"tool\": \"sendAnEmail\", \"tool_input\": \n  { \"content\": \n    \"In Quarkus, where Java flows free,\nA stream of innovation, for you and me.\nWith microservices, it's a world to see,\nA new way to code, wild and carefree.\" } }\n\n\n\n  \n "},"done_reason":"stop","done":true,"total_duration":980876099,"load_duration":913527,"prompt_eval_count":224,"prompt_eval_duration":127446000,"eval_count":70,"eval_duration":718718000}

After this the extension properly executed the tool:

2024-06-11 09:20:27,993 INFO  [io.qua.lan.sam.EmailService] (executor-thread-1) Sending an email

Then the following is sent to Ollama:

2024-06-11 09:20:28,002 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Request:
- method: POST
- url: http://localhost:11434/api/chat
- headers: [Accept: application/json], [Content-Type: application/json], [User-Agent: Resteasy Reactive Client], [content-length: 1792]
- body: {
  "model" : "llama3",
  "messages" : [ {
    "role" : "SYSTEM",
    "content" : "You are a professional poet\nYou have access to the following tools:\n\n[ {\n  \"name\" : \"sendAnEmail\",\n  \"description\" : \"send the given content by email\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"content\" : {\n        \"type\" : \"string\"\n      }\n    },\n    \"required\" : [ \"content\" ]\n  }\n}, {\n  \"name\" : \"__conversational_response\",\n  \"description\" : \"Respond conversationally if no other tools should be called for a given query and history.\",\n  \"parameters\" : {\n    \"type\" : \"object\",\n    \"properties\" : {\n      \"reponse\" : {\n        \"type\" : \"string\",\n        \"description\" : \"Conversational response to the user.\"\n      }\n    },\n    \"required\" : [ \"response\" ]\n  }\n} ]\n\nYou must always select one of the above tools and respond with a JSON object matching the following schema,\nand only this json object:\n{\n  \"tool\": <name of the selected tool>,\n  \"tool_input\": <parameters for the selected tool, matching the tool's JSON schema>\n}\nDo not use other tools than the ones from the list above. Always provide the \"tool_input\" field.\nIf several tools are necessary, answer them sequentially.\n\nWhen the user provides sufficient information, answer with the __conversational_response tool.\n"
  }, {
    "role" : "USER",
    "content" : "Write a poem about Quarkus. The poem should be 4 lines long.\nThen send this poem by email. Your response should include the poem.\n"
  }, {
    "role" : "ASSISTANT"
  }, {
    "role" : "USER",
    "content" : "Success"
  } ],
  "options" : {
    "temperature" : 0.8,
    "top_k" : 40,
    "top_p" : 0.9
  },
  "format" : "json",
  "stream" : false
}

The response however is now problematic:

2024-06-11 09:20:28,888 INFO  [io.qua.lan.oll.OllamaRestApi$OllamaLogger] (vert.x-eventloop-thread-2) Response:
- status code: 200
- headers: [Content-Type: application/json; charset=utf-8], [Date: Tue, 11 Jun 2024 06:20:28 GMT], [Content-Length: 548]
- body: {"model":"llama3","created_at":"2024-06-11T06:20:28.887595616Z","message":{"role":"assistant","content":"{ \"tool\": \"sendAnEmail\", \"tool_input\": { \"content\": \"Quarkus, a framework so fine,\nBuilt for Java, with Quarkus divine.\nIt brings us power, and speed to our code,\nAnd makes our apps shine like a star in the road.\n\nBest regards, [Your Name]\" } }"},"done_reason":"stop","done":true,"total_duration":885540061,"load_duration":1477410,"prompt_eval_count":12,"prompt_eval_duration":68617000,"eval_count":70,"eval_duration":677705000}

As you can see it tells us to execute the tool again... This keep on happening without the sequence ending from the Ollama side.

* Whether to enable the experimental tools
*/
@WithDefault("false")
Optional<Boolean> experimentalTools();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather just name it tools and mark it as experimental in a comment? To avoid having to do a breaking change once we don't consider it experimental anymore...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name it enableTools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will stay experimental till ollama implements tools feature.

@humcqc
Copy link
Contributor Author

humcqc commented Jun 11, 2024

As you can see it tells us to execute the tool again... This keep on happening without the sequence ending from the Ollama side.

Yes the issue with this approach is that the llm should be aware the tool has been executed. We can have a simplified approach where we just trigger one tool without recursivity. OR the tools should always answer with a status for llm.

I will try to add some example with the sendPoem.

@humcqc
Copy link
Contributor Author

humcqc commented Jun 12, 2024

I've updated the prompt + added some tools history in user messages. But didn't find the good way to avoid selecting twice the same tool. Perhaps @langchain4j @jmartisk or @geoand you can help here ?

@geoand
Copy link
Collaborator

geoand commented Jun 13, 2024

By selecting twice, do you mean the tool gets executed twice?

@humcqc
Copy link
Contributor Author

humcqc commented Jun 13, 2024

By selecting twice, do you mean the tool gets executed twice?

yes

@geoand
Copy link
Collaborator

geoand commented Jun 13, 2024

We can't really do much here, the LLM is supposed to decide which tools need to get executed as complex workflows may need to have multiple tool invocations. OpenAI handles this seamlessly.

@humcqc
Copy link
Contributor Author

humcqc commented Jun 13, 2024

We can't really do much here, the LLM is supposed to decide which tools need to get executed as complex workflows may need to have multiple tool invocations. OpenAI handles this seamlessly.

yes, but it's weird that this one https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-4cad3d1a7b72dca01c9cf8f6019dfdc9c8949b729fdafe2cbda381631db6f88bR34 seems to work correctly and it is more complex that the send Poem.

I think I'm missing the correct inputs/prompt to tell LLM that the action has been executed.

@geoand
Copy link
Collaborator

geoand commented Jun 13, 2024

In that case, I would turn on logging of requests and responses and compare the one that works with the one that does not.

@geoand
Copy link
Collaborator

geoand commented Jun 13, 2024

By the way, I want to clarify if that we can get this to work properly, it's a no brainer for inclusion :)

@humcqc
Copy link
Contributor Author

humcqc commented Jun 15, 2024

New approach: ask llm to create list of tools to execute and then respond with previous result.
Seems to work with llama3 , not yet with other models.
Tests in https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-d06a2b262b5211fac51ddebbe50152fd0ea4e93e0ee0ff5f6e764eb5d649827c

Needs some modification in core : https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-2dd3bec40934ad6d175f6f14dad1af0e11c234cf5fec69739a89460d472ab55b
I add some stuff to use previous tools as input of next tools and response.

I need to check OpenAI broken tests but they don't work on my side.

Still in progress, but the main part could be done in langchain4j and then used in ollama models from langchain and quarkus-langchain.

@humcqc
Copy link
Contributor Author

humcqc commented Jun 16, 2024

https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-2dd3bec40934ad6d175f6f14dad1af0e11c234cf5fec69739a89460d472ab55bR235

In order to replace some AI response containing variables with function results I've changed the order of chat memory.
With my changes we add function result and then AI response.
But in the tests you are expecting AI response first and then function results.

WDYT ? Could I change the messages order in the tests ? Or should I keep the existing order and adapt the tools executor part.

@humcqc
Copy link
Contributor Author

humcqc commented Jul 15, 2024

Finally I've come back to the full implementation in quarkus-langchain4j, will discuss later with langchain4j if we want to port it.
Still in progress, but could already be tested as a experimental feature -> https://github.com/quarkiverse/quarkus-langchain4j/pull/662/files#diff-79b70eb6ea261b73ffd9dd46e5d3beb3260d1d3b37319a49fe5e37fe2278d401

Still need to:

  • make it work with more models (mistral, qwen2, phi3)
  • find a better way to enable it.
  • Perhaps will remove ExperimentalSequentialToolsDelegate as it is less efficient than ExperimentalParallelToolsDelegate.

But if someone could test and give me some feedbacks to finalize, it could be cool.

- cleanup parameters
- doc
@geoand
Copy link
Collaborator

geoand commented Jul 16, 2024

But if someone could test and give me some feedbacks to finalize, it could be cool.

I will try it soon

@geoand
Copy link
Collaborator

geoand commented Jul 16, 2024

Thanks, I'll have another look tomorrow.

One thing I can say right now is that when everything is ready, we will need the commits squashed and the PR rebased onto the latest main (if it's not already)

@humcqc
Copy link
Contributor Author

humcqc commented Jul 16, 2024

Thanks, I'll have another look tomorrow.

One thing I can say right now is that when everything is ready, we will need the commits squashed and the PR rebased onto the latest main (if it's not already)

Sure, this is just the draft PR of a pretty complex feature, i'm still thinking on some enhancements and perhaps another way to register it like a dedicated ChatLanguageModel instead of an option. Or even else a specific extension. When finalized i will create a new PR based on latest Main with clean git log.

I'm already using it in my project to see if it covers my uses cases and seems ok. But It should be tested through other use cases.

@humcqc
Copy link
Contributor Author

humcqc commented Jul 24, 2024

Hi @geoand ,
Seems ollama tools are close to be ready.
In that case, I could use my experimental model in my own project if you think it's too experimental :) it allow to have the answer in one call to llm instead of multiple calls.

But I would need the changes in this PR of the core part for my Model to work.

WDYT ?

@geoand
Copy link
Collaborator

geoand commented Jul 25, 2024

If Ollama is close to releasing official support for tools, it's probably best to wait

@geoand
Copy link
Collaborator

geoand commented Jul 26, 2024

@humcqc Ollama 0.3.0 was released and contains tools support!

#783 is the change that is needed to bring it in. Compared to this change, it's much simpler so I hope you don't mind if we close this PR in favor of the other one.
I would however like to thank you very much for your work on this!

@geoand geoand closed this Jul 26, 2024
@humcqc
Copy link
Contributor Author

humcqc commented Jul 26, 2024 via email

@geoand
Copy link
Collaborator

geoand commented Jul 26, 2024

👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants