Introduce a new foundation for Dev Services like experience #557

geoand · 2024-05-09T12:42:20Z

The basic idea behind this is that the LLM model being run and the API being exposed is not always tightly couple - for example the Mistral models can be run in Ollama, not just in Mistral's SaaS offering.
This PR sets the foundation for having an inference server (for now only Ollama) be able to run inference for the LLM models the user configured.
In the Ollama implementation we are able to instruct Ollama to pull the selected model name(if it exists and if it is not already present) and make the necessary
configuration behind the scenes so each configured LangChain4j chat model configuration will use the inference server.

It is important to note that for this to work the configuration of the model to be used must know become a build time property (which probably makes sense regardless).

The following remains to be done:

~~Use the same idea for the rest of the LangChain4j models (like embedding models)~~
Use Ollama's OpenAI compatibility mode to support the OpenAI extension
~~Update the documentation~~

The way the PR has been done, this would allow other inference servers to be added in the future with minimal changes (the most significant of which would be a way to resolve the conflict where multiple servers can serve a model)

Relates to: Local Dev Service #505

geoand · 2024-05-09T15:11:48Z

I am going to mark this as ready for review as I believe that it's good to go as is and the OpenAI compatibility (which is very limited in Ollama currently anyway) can be introduced in a follow up.

...a/deployment/src/main/java/io/quarkiverse/langchain4j/ollama/deployment/OllamaProcessor.java

docs/modules/ROOT/pages/ollama.adoc

maxandersen · 2024-05-10T08:05:32Z

So idea is allowing to use ollama, podman ai lab, ai studio etc. To "host" same/closest appproximation model?

And not necessarily tied to a container runtime (docker/podman) ?

Definitely +1 :)

The basic idea behind this is that the LLM model being run and the API being exposed is not always tightly couple - for example the Mistral models can be run in Ollama, not just in Mistral's SaaS offering. This PR sets the foundation for having an inference server (for now only Ollama) be able to run inference for the LLM models the user configured. In the Ollama implementation we are able to instruct Ollama to pull the selected model name(if it exists and if it is not already present) and make the necessary configuration behind the scenes so each configured LangChain4j chat model configuration will use the inference server. The following remains to be done: * Use the same idea for the rest of the LangChain4j models (like embedding models) * Use Ollama's OpenAI compatibility mode to support the OpenAI extension

geoand · 2024-05-10T15:16:17Z

Another interesting fact is that InstructLab supports serving the trained models using an OpenAI compatible API.

geoand · 2024-05-13T10:47:18Z

@cescoffier have you had a chance to test this one?
I'm really interested in your experience with it as well (not least because you opened the original issue).

cescoffier · 2024-05-13T12:35:12Z

I'm not sure how I should be using it.

I tried with ollama (which works OOTB for me - nothing to do), The dev services was ignored.

Also, it seems that there is some API mismatch:

Ollama is using http://localhost:53977/v1/api/chat while I would have expected: http://localhost:53977/v1/chat/completions

geoand · 2024-05-13T12:36:10Z

You need to update the version if you are using the samples

geoand · 2024-05-13T12:36:42Z

Ollama is using http://localhost:53977/v1/api/chat while I would have expected: http://localhost:53977/v1/chat/completions

At the time being, I have not used the OpenAI compatibility stuff for reasons I have explained above

cescoffier · 2024-05-13T13:47:24Z

If I use:

quarkus.langchain4j.ollama.chat-model.temperature=0.8
quarkus.langchain4j.openai.timeout=60s

it works, but it was working already (the model is already pulled and the server is running). What change should I expect?

geoand · 2024-05-13T13:56:27Z

You can try to use something like quarkus.langchain4j.ollama.chat-model.model-id=phi3 in order to work with a new model (the default is llama3 which I guess you already have locally)

cescoffier · 2024-05-13T13:57:43Z

Ah ok, it pulls the model, that's what I was missing!

geoand · 2024-05-13T13:59:34Z

Yeah, and it's the dev experience one expects where you only need to configure what is necessary (unlike the existing dev service where you need multiple things configured)

cescoffier · 2024-05-13T14:02:08Z

Hum, something seems to be broken.

It pulls the model, but then the application was not calling it - actually nothing was called:

Listening for transport dt_socket at address: 5005

--/ __ / / / / _ | / _ / /// / / / __/
-/ // / // / __ |/ , / ,< / // /\ \
--___// |//|//||_//
2024-05-13 15:58:38,675 WARN [io.qua.config] (Quarkus Main Thread) Unrecognized configuration key "quarkus.langchain4j.openai.timeout" was provided; it will be ignored; verify that the dependency extension for this configuration is set or that you did not make a typo
2024-05-13 15:58:39,043 INFO [io.quarkus] (Quarkus Main Thread) quarkus-langchain4j-sample-review-triage 1.0-SNAPSHOT on JVM (powered by Quarkus 3.8.2) started in 72.372s. Listening on: http://localhost:8080
2024-05-13 15:58:39,044 INFO [io.quarkus] (Quarkus Main Thread) Profile dev activated. Live Coding activated.
2024-05-13 15:58:39,044 INFO [io.quarkus] (Quarkus Main Thread) Installed features: [cdi, langchain4j, langchain4j-ollama, qute, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, smallrye-context-propagation, vertex]

Tried to invoke the model ....

2024-05-13 16:00:59,858 INFO [io.quarkus] (Shutdown thread) quarkus-langchain4j-sample-review-triage stopped in 0.008s

Worked with a restart.

cescoffier · 2024-05-13T14:04:10Z

BTw, when I said worked, it is a bit weird. The response was the following:

- body: {"model":"phi3","created_at":"2024-05-13T14:01:17.342015Z","message":{"role":"assistant","content":"{\n  \"evaluation\": \"POSITIVE\",\n  \"message\": \"Thank you for your kind words!\"\n}\n\nTo determine the sentiment analysis and language identification programmatically, one would typically use a natural language processing library capable of multi-language support, such as `spaCy`, with additional preprocessing to handle transliterations or direct translation equivalents. However, since this is an example response format without actual code execution, I've manually categorized the sentiment and provided an appropriate message in English for clarity."},"done_reason":"stop","done":true,"total_duration":6125478875,"load_duration":2639789458,"prompt_eval_count":348,"prompt_eval_duration":881733000,"eval_count":112,"eval_duration":2595494000}

geoand · 2024-05-13T14:26:35Z

That's an issue with the model though, no?

cescoffier · 2024-05-13T14:49:19Z

@geoand Removed llama3 and it re-pulled it and then it worked perfectly!

geoand · 2024-05-13T14:52:45Z

If things work as expected for you, I would like to get this in so I can proceed to improve on it later without having the PR go stale (as it has a lot small changes)

geoand requested a review from a team as a code owner May 9, 2024 12:42

geoand marked this pull request as draft May 9, 2024 12:42

geoand force-pushed the #505 branch 8 times, most recently from 007f9e1 to e8deb32 Compare May 9, 2024 15:10

geoand marked this pull request as ready for review May 9, 2024 15:11

jmartisk reviewed May 10, 2024

View reviewed changes

...a/deployment/src/main/java/io/quarkiverse/langchain4j/ollama/deployment/OllamaProcessor.java Outdated Show resolved Hide resolved

jmartisk reviewed May 10, 2024

View reviewed changes

docs/modules/ROOT/pages/ollama.adoc Show resolved Hide resolved

geoand force-pushed the #505 branch from e8deb32 to aa2f459 Compare May 10, 2024 09:25

geoand force-pushed the #505 branch from aa2f459 to 6c078fd Compare May 10, 2024 09:56

geoand mentioned this pull request May 10, 2024

Revamp ollama DevService #453

Closed

cescoffier approved these changes May 13, 2024

View reviewed changes

geoand merged commit edce572 into main May 13, 2024
12 checks passed

geoand deleted the #505 branch May 13, 2024 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a new foundation for Dev Services like experience #557

Introduce a new foundation for Dev Services like experience #557

geoand commented May 9, 2024 •

edited

Loading

geoand commented May 9, 2024

maxandersen commented May 10, 2024

geoand commented May 10, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024 •

edited

Loading

cescoffier commented May 13, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024

Introduce a new foundation for Dev Services like experience #557

Introduce a new foundation for Dev Services like experience #557

Conversation

geoand commented May 9, 2024 • edited Loading

geoand commented May 9, 2024

maxandersen commented May 10, 2024

geoand commented May 10, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024 • edited Loading

cescoffier commented May 13, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

Tried to invoke the model ....

cescoffier commented May 13, 2024

geoand commented May 13, 2024

cescoffier commented May 13, 2024

geoand commented May 13, 2024

geoand commented May 9, 2024 •

edited

Loading

geoand commented May 13, 2024 •

edited

Loading