Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resilience and AI Service #748

Closed
cescoffier opened this issue Jul 16, 2024 · 6 comments · Fixed by #764
Closed

Resilience and AI Service #748

cescoffier opened this issue Jul 16, 2024 · 6 comments · Fixed by #764

Comments

@cescoffier
Copy link
Collaborator

cescoffier commented Jul 16, 2024

This issue discusses the resilience in AI Services and their impact on the memory/context.

Context:

I'm using Granite 7B instruct, with a relatively limited context size (2048 tokens). My prompt (user message) is relatively large.
I was using @Retry on the IAService method, as the model misbehaves sometimes, and retrying improves reliability (the response time is not a factor in my context).

My AI Service calls are part of an HTTP request processing, so they are part of the request scope associated with the HTTP request.

Problem:

When retrying, the context grows, including multiple times the user message, which eventually exceeds the context size.

Let's try to describe it:

HTTP request -> 
       - AI Service call - context=[user message] -> Failure
       - AI Service call (retry) - context=[user message, user message] -> Failure
       - AI Service call (retry) - context=[user message, user message, user message] -> Failure, because of context size exceeded (unrecoverable)
       -   AI Service call (retry) - context=[user message, user message, user message, user message] -> Failure, because of context size exceeded (useless, it's not recoverable)

While my issue was on a @Retry, it may happen when using @CircuitBreaker, and so on.

Some ideas:

  • We could imagine a way to manipulate the context when a retry is executed so we would not append it multiple times. However, SmallRye Fault Tolerance does not have this capability yet. We could inject an ID into the call to detect that.
  • We could handle retry in the chat client. The problem is that we would also need rate limiting and circuit breakers, which would duplicate a lot of complex code.
@geoand
Copy link
Collaborator

geoand commented Jul 16, 2024

We could handle retry in the chat client. The problem is that we would also need rate limiting and circuit breakers, which would duplicate a lot of complex code.

I am wondering if we can behind the scenes "move" the resilience declared on the AiService to the underlying client...

@geoand
Copy link
Collaborator

geoand commented Jul 16, 2024

From the Zulip discussion, another interesting idea is smallrye/smallrye-fault-tolerance#259

cescoffier added a commit to cescoffier/quarkus-openshift-workshop that referenced this issue Jul 16, 2024
- Rewrite the prompt to work with Granite 7B Instruct
- Add a retry strategy (however, we are hitting quarkiverse/quarkus-langchain4j#748)
- Add a readme with the instructions to run the application locally
@maxandersen
Copy link
Member

shouldn't the state used to do a call avoid being mutated before the call has been completed? wouldn't that avoid the "growing" ?

@geoand
Copy link
Collaborator

geoand commented Jul 16, 2024

So you are essentially proposing that the chat memory only be added to when the call succeeds, right?

That could potentially work...

@geoand
Copy link
Collaborator

geoand commented Jul 16, 2024

That could potentially work...

It's actually a lot trickier than I thought because there are potentially multiple API calls that go into implementing an AI service and that add (and even remove) to / from memory

@geoand
Copy link
Collaborator

geoand commented Jul 23, 2024

#764 fixes this

@geoand geoand closed this as completed in 91553d7 Jul 23, 2024
geoand added a commit that referenced this issue Jul 23, 2024
Ensure that @Retry works properly with chat memory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants