Implement semantic llm response caching #7048

AntonioFerreras · 2024-10-22T20:33:44Z

AntonioFerreras
Oct 22, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

It is common to cache LLM response's by their prompt, reusing responses when new duplicate prompts are issued. However, two prompts may have different wording but identical semantic meaning. Such prompts will be considered different by the cache. There is no implementation of semantic llm caches in LangChainJS.

Motivation

Semantic caching is useful for increasing cache hit rate and useful for certain use cases. Although it is not useful or safe for long chats, for first interaction prompts it can provide a time save. For example:

USER1
Prompt: "Tell me a joke"
Response: "Why did the chicken cross the road..."

USER2
Prompt: "Give me a joke"
Response can used cached response from previous user, saving wait time.

Proposal (If applicable)

In the python version this functionality exists already, it can be done with Upstash or Redis vector stores. This functionality does not exist in langchainjs.

eyurtsev · 2024-10-23T02:46:08Z

eyurtsev
Oct 23, 2024
Collaborator

Related discussion here: langchain-ai/langchain#27548

How are you thinking about implementing it?

https://v03.api.js.langchain.com/classes/_langchain_core.caches_base.BaseCache.html#lookup

Base Cache abstraction was designed before chat models were a thing (it's from the days of string in string out llms).

As a result you don't have access to the chat history in the abstraction, but to some serialized representation of it.

If I were building this feature for my own production use case, I'd probably build it with langgraph and chat models these days and just use a vectorstore look up on the content of the human message.

1 reply

AntonioFerreras Oct 23, 2024
Author

I was planning on not including the chat history, and implementing it similarly to how the existing Upstach, Redis, and GPTCache integrations do it. Which is as you describe, using a vectorstore lookup on the content of prompt. (https://python.langchain.com/docs/integrations/llm_caching/#semantic-cache). Semantic model cache is a feature present in the python version, but not in langchainjs.

AntonioFerreras · 2024-10-24T22:36:51Z

AntonioFerreras
Oct 24, 2024
Author

Hi @jacoblee93! Do you think we could turn this into an issue and start working on it?

0 replies

AntonioFerreras · 2024-10-27T02:15:24Z

AntonioFerreras
Oct 27, 2024
Author

Hey @jacoblee93 just checking in, would you mind sharing your thoughts on this proposal? Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement semantic llm response caching #7048

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Implement semantic llm response caching #7048

AntonioFerreras Oct 22, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 3 comments · 1 reply

eyurtsev Oct 23, 2024 Collaborator

AntonioFerreras Oct 23, 2024 Author

AntonioFerreras Oct 24, 2024 Author

AntonioFerreras Oct 27, 2024 Author

AntonioFerreras
Oct 22, 2024

Replies: 3 comments 1 reply

eyurtsev
Oct 23, 2024
Collaborator

AntonioFerreras Oct 23, 2024
Author

AntonioFerreras
Oct 24, 2024
Author

AntonioFerreras
Oct 27, 2024
Author