Fix #7765: Fix exceed context length error #8530
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
I was getting errors of exceeding my context window (see #7765) after a few messages and after some discussion, it seemed like the
ChatMemoryBuffer
is meant to guard against this and it does actually start to forget some history after a while. But a problem will occur if the system prompt (in the prefix-messages) is quite long and in addition to the chat history, we can actually get an error regarding exceeding the context window.I should also add that the
messages_to_prompts
always expect that we have alternatinguser
andassistant
interactions and hence I wrote in a line so that when we are forgetting chat history, we delete pairs of messages so we still have these alternating messages. It's also possible to get errors if we forget chat history but we have the chat history starting withassistant
messages in the chat history.In this PR, I made some changes to the
ChatMemoryBuffer.get()
method by allowing you to pass in aninitial_token_count
argument which represents a kind of amount of context window to keep available and this gets added to each time we assess the token count of the chat history.I have made changes to the
context
andsimple
chat engines where we first compute the number of tokens in the prefix-messages and pass that into theChatMemoryBuffer.get()
method and this ensures that we do not have a resultingall_messages
which exceeds the length.Happy to add some tests if feel necessary.
Fixes #7765 (this is currently closed, but I think it's still an open issue actually - see #7765 (comment))
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods