-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior in ChatSession.ChatAsync methods #261
Comments
This kind of confusion with ChatSession/History/various executors is actually exactly what got me started contributing to LLamaSharp! If you're interested in making any PRs to improve to current behaviour (even if it requires breaking changes) I'd be very interested to review them. In the longer term I think we're going to need to do a complete redesign of the higher level features at some point. To me it's very confusing to have 3 types of executors (two of which seem very similar) and the ChatSession built on top of that (which seems fairly similar again). To kick off that discussion: what do you think would be the "ideal" high API for LLamaSharp (completely ignoring all backwards compatibility problems for now)? |
Good, I'm not alone in my confusion :) I do have some ideas regarding the ChatSession design. It's good to know that you are not averse to breaking changes, that will make it a lot easier to bring this part of LLamaSharp forward. Long-term it would be great to have the following:
Just of the top of my head. There's probably more that can be added but I'd have to fiddle around and spend some time thinking about what makes sense to expose to the user. Short-term, I think the change I linked in my fork would greatly improve the utility of the ChatSession (make it usable at all really). Maybe this could be merged soon? |
If you want to open up a PR with this change I'll be happy to review it :)
This one is tricky, because there's are several different "layers" you can do it at. You might want to rebuild a new chunk of text (e.g. keep prompt, summarise the rest). Or you might want to go a level "lower" and just rearrange tokens (e.g. keep prompt tokens and most recent output tokens). Or you might want to do some magic with shifting around the KV cache.
Looks like there's some discussion of embedding jinja templates into gguf (here).
if you come up with anything interesting I opened up a discussion the other day, around thoughts for an entire new executor in LLamaSharp. I'd love to hear your thoughts. |
I've been trying to create a chatbot with the ChatSession class and IHistoryTransform implementation to use the correct template for the LLM I am testing LLamaSharp with. I ran into unexpected behavior and could not get good responses from the model.
The only way I got a decent response was when I sent the whole history as a prompt with the correct formatting. Saving and loading the session would lead to ever growing context and significant slowdown.
I noticed that the use of the IHistoryTransform interface in the ChatSession class is leading to these unexpected results. When the ChatSession class was refactored to use the IHistoryTransform interface, the behavior changed dramatically by parsing the text in the prompt argument and adding the prompt (which may be the whole history) into the history of the session.
It's a little bit hard to untangle. I forked the repo and changed the behavior to be a bit more straightforward. With the overload method that accepts a string prompt managing the internal history for the user and the overload that accepts a ChatHistory argument allowing the user to manage the history themselves.
See here: master...philippjbauer:LLamaSharp:master
I see many more improvements here, especially in how the IHistoryTransform interface could be changed to accommodate better templating support. Making use of public properties for role tokens and end tokens so that these tokens can be replaced internally before adding the generated message to the history instead of relying on the inferenceParam.AntiPrompt property.
If this is missing the mark completely let me know, but I couldn't figure out how this should be used after reading the documentation and looking at the code.
Edit: A little bit lengthy but here is my IHistoryTransform implementation:
The FullTextToHistory method is a crutch to load the history I save alongside the session data. I think this should be handled by the ChatSession class and saved as a JSON formatted file or similar for easy portability when necessary.
The text was updated successfully, but these errors were encountered: