Resilience and AI Service #748

cescoffier · 2024-07-16T07:27:18Z

This issue discusses the resilience in AI Services and their impact on the memory/context.

Context:

I'm using Granite 7B instruct, with a relatively limited context size (2048 tokens). My prompt (user message) is relatively large.
I was using @Retry on the IAService method, as the model misbehaves sometimes, and retrying improves reliability (the response time is not a factor in my context).

My AI Service calls are part of an HTTP request processing, so they are part of the request scope associated with the HTTP request.

Problem:

When retrying, the context grows, including multiple times the user message, which eventually exceeds the context size.

Let's try to describe it:

HTTP request -> 
       - AI Service call - context=[user message] -> Failure
       - AI Service call (retry) - context=[user message, user message] -> Failure
       - AI Service call (retry) - context=[user message, user message, user message] -> Failure, because of context size exceeded (unrecoverable)
       -   AI Service call (retry) - context=[user message, user message, user message, user message] -> Failure, because of context size exceeded (useless, it's not recoverable)

While my issue was on a @Retry, it may happen when using @CircuitBreaker, and so on.

Some ideas:

We could imagine a way to manipulate the context when a retry is executed so we would not append it multiple times. However, SmallRye Fault Tolerance does not have this capability yet. We could inject an ID into the call to detect that.
We could handle retry in the chat client. The problem is that we would also need rate limiting and circuit breakers, which would duplicate a lot of complex code.

The text was updated successfully, but these errors were encountered:

geoand · 2024-07-16T07:39:25Z

We could handle retry in the chat client. The problem is that we would also need rate limiting and circuit breakers, which would duplicate a lot of complex code.

I am wondering if we can behind the scenes "move" the resilience declared on the AiService to the underlying client...

geoand · 2024-07-16T07:45:19Z

From the Zulip discussion, another interesting idea is smallrye/smallrye-fault-tolerance#259

- Rewrite the prompt to work with Granite 7B Instruct - Add a retry strategy (however, we are hitting quarkiverse/quarkus-langchain4j#748) - Add a readme with the instructions to run the application locally

maxandersen · 2024-07-16T10:34:57Z

shouldn't the state used to do a call avoid being mutated before the call has been completed? wouldn't that avoid the "growing" ?

geoand · 2024-07-16T10:43:00Z

So you are essentially proposing that the chat memory only be added to when the call succeeds, right?

That could potentially work...

geoand · 2024-07-16T11:24:55Z

That could potentially work...

It's actually a lot trickier than I thought because there are potentially multiple API calls that go into implementing an AI service and that add (and even remove) to / from memory

Fixes: #748

geoand · 2024-07-23T08:56:58Z

#764 fixes this

@Retry

Ensure that @Retry works properly with chat memory

cescoffier mentioned this issue Jul 16, 2024

Switch to Granite on OpenShift AI cescoffier/quarkus-openshift-workshop#30

Open

geoand added a commit that referenced this issue Jul 22, 2024

Ensure that @Retry works properly with chat memory

c724456

Fixes: #748

geoand added a commit that referenced this issue Jul 22, 2024

Ensure that @Retry works properly with chat memory

36c3874

Fixes: #748

geoand mentioned this issue Jul 22, 2024

Ensure that @Retry works properly with chat memory #764

Merged

geoand added a commit that referenced this issue Jul 22, 2024

Ensure that @Retry works properly with chat memory

feb7561

Fixes: #748

geoand closed this as completed in #764 Jul 23, 2024

geoand closed this as completed in 91553d7 Jul 23, 2024

geoand added a commit that referenced this issue Jul 23, 2024

Merge pull request #764 from quarkiverse/#748

557ceb8

Ensure that @Retry works properly with chat memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resilience and AI Service #748

Resilience and AI Service #748

cescoffier commented Jul 16, 2024 •

edited by geoand

Loading

geoand commented Jul 16, 2024

geoand commented Jul 16, 2024

maxandersen commented Jul 16, 2024

geoand commented Jul 16, 2024

geoand commented Jul 16, 2024

geoand commented Jul 23, 2024

Resilience and AI Service #748

Resilience and AI Service #748

Comments

cescoffier commented Jul 16, 2024 • edited by geoand Loading

geoand commented Jul 16, 2024

geoand commented Jul 16, 2024

maxandersen commented Jul 16, 2024

geoand commented Jul 16, 2024

geoand commented Jul 16, 2024

geoand commented Jul 23, 2024

cescoffier commented Jul 16, 2024 •

edited by geoand

Loading