Implement semantic llm response caching #7048
Replies: 3 comments 1 reply
-
Related discussion here: langchain-ai/langchain#27548 How are you thinking about implementing it? https://v03.api.js.langchain.com/classes/_langchain_core.caches_base.BaseCache.html#lookup Base Cache abstraction was designed before chat models were a thing (it's from the days of string in string out llms). As a result you don't have access to the chat history in the abstraction, but to some serialized representation of it. If I were building this feature for my own production use case, I'd probably build it with langgraph and chat models these days and just use a vectorstore look up on the content of the human message. |
Beta Was this translation helpful? Give feedback.
-
Hi @jacoblee93! Do you think we could turn this into an issue and start working on it? |
Beta Was this translation helpful? Give feedback.
-
Hey @jacoblee93 just checking in, would you mind sharing your thoughts on this proposal? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Checked
Feature request
It is common to cache LLM response's by their prompt, reusing responses when new duplicate prompts are issued. However, two prompts may have different wording but identical semantic meaning. Such prompts will be considered different by the cache. There is no implementation of semantic llm caches in LangChainJS.
Motivation
Semantic caching is useful for increasing cache hit rate and useful for certain use cases. Although it is not useful or safe for long chats, for first interaction prompts it can provide a time save. For example:
USER1
Prompt: "Tell me a joke"
Response: "Why did the chicken cross the road..."
USER2
Prompt: "Give me a joke"
Response can used cached response from previous user, saving wait time.
Proposal (If applicable)
In the python version this functionality exists already, it can be done with Upstash or Redis vector stores. This functionality does not exist in langchainjs.
Beta Was this translation helpful? Give feedback.
All reactions