Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior in ChatSession.ChatAsync methods #261

Open
philippjbauer opened this issue Nov 7, 2023 · 3 comments
Open

Unexpected behavior in ChatSession.ChatAsync methods #261

philippjbauer opened this issue Nov 7, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@philippjbauer
Copy link
Contributor

philippjbauer commented Nov 7, 2023

I've been trying to create a chatbot with the ChatSession class and IHistoryTransform implementation to use the correct template for the LLM I am testing LLamaSharp with. I ran into unexpected behavior and could not get good responses from the model.

The only way I got a decent response was when I sent the whole history as a prompt with the correct formatting. Saving and loading the session would lead to ever growing context and significant slowdown.

I noticed that the use of the IHistoryTransform interface in the ChatSession class is leading to these unexpected results. When the ChatSession class was refactored to use the IHistoryTransform interface, the behavior changed dramatically by parsing the text in the prompt argument and adding the prompt (which may be the whole history) into the history of the session.

It's a little bit hard to untangle. I forked the repo and changed the behavior to be a bit more straightforward. With the overload method that accepts a string prompt managing the internal history for the user and the overload that accepts a ChatHistory argument allowing the user to manage the history themselves.

See here: master...philippjbauer:LLamaSharp:master

I see many more improvements here, especially in how the IHistoryTransform interface could be changed to accommodate better templating support. Making use of public properties for role tokens and end tokens so that these tokens can be replaced internally before adding the generated message to the history instead of relying on the inferenceParam.AntiPrompt property.

If this is missing the mark completely let me know, but I couldn't figure out how this should be used after reading the documentation and looking at the code.

Edit: A little bit lengthy but here is my IHistoryTransform implementation:

public class HistoryTransform : IHistoryTransform
{
	private string _nl = Environment.NewLine;
	private string _roleTemplate = "<|{0}|>";
	private string _endToken = "</s>";
	private string _regexPattern = @"(?:<\|([a-z]+)\|>\n)([^<]*)(?:</s>)";

	public string RoleTemplate => _roleTemplate;
	public string EndToken => _endToken;

	public HistoryTransform(
		string? roleTemplate = null,
		string? endToken = null,
		string? regexPattern = null)
	{
		_roleTemplate = roleTemplate ?? _roleTemplate;
		_endToken = endToken ?? _endToken;
		_regexPattern = regexPattern ?? _regexPattern;
	}

	public string HistoryToText(ChatHistory history)
	{
		string text = "";

		foreach (Message message in history.Messages)
		{
			text += FormatMessage(message);
		}

		AuthorRole nextRole = history.Messages.Last().AuthorRole == AuthorRole.User
			? AuthorRole.Assistant
			: AuthorRole.User;

		text += $"{FormatRole(nextRole)}{_nl}";

		return text;
	}

	public ChatHistory TextToHistory(AuthorRole role, string text)
	{
		ChatHistory history = new();

		Match match = Regex.Match(text, _regexPattern);
		string message = match.Groups[2].Value.Trim();

		history.AddMessage(role, message);

		return history;
	}

	// Not part of interface, use in program to load saved history
	public ChatHistory FullTextToHistory(string text)
	{
		ChatHistory history = new();

		MatchCollection matches = Regex.Matches(text, _regexPattern);

		foreach (Match match in matches.Cast<Match>())
		{
			AuthorRole role = match.Groups[1].Value switch
			{
				"system" => AuthorRole.System,
				"user" => AuthorRole.User,
				"assistant" => AuthorRole.Assistant,
				_ => AuthorRole.System
			};

			string message = match.Groups[2].Value.Trim();

			history.AddMessage(role, message);
		}

		return history;
	}

	public string FormatRole(AuthorRole role)
	{
		return string.Format(_roleTemplate, role.ToString().ToLower());
	}

	public string FormatMessage(Message message)
	{
		return $"{FormatRole(message.AuthorRole)}{_nl}{message.Content.Replace(_endToken, "")}{_endToken}{_nl}";
	}
}

The FullTextToHistory method is a crutch to load the history I save alongside the session data. I think this should be handled by the ChatSession class and saved as a JSON formatted file or similar for easy portability when necessary.

@martindevans
Copy link
Member

martindevans commented Nov 7, 2023

This kind of confusion with ChatSession/History/various executors is actually exactly what got me started contributing to LLamaSharp! If you're interested in making any PRs to improve to current behaviour (even if it requires breaking changes) I'd be very interested to review them.

In the longer term I think we're going to need to do a complete redesign of the higher level features at some point. To me it's very confusing to have 3 types of executors (two of which seem very similar) and the ChatSession built on top of that (which seems fairly similar again).

To kick off that discussion: what do you think would be the "ideal" high API for LLamaSharp (completely ignoring all backwards compatibility problems for now)?

@philippjbauer
Copy link
Contributor Author

Good, I'm not alone in my confusion :)

I do have some ideas regarding the ChatSession design. It's good to know that you are not averse to breaking changes, that will make it a lot easier to bring this part of LLamaSharp forward.

Long-term it would be great to have the following:

  • Context Overflow Policy (see LM Studio)
    • Stop at limit
    • Keep system prompt and first user message, truncate middle
    • Maintain rolling window and truncate past (I'd say keep system prompt as well though)
  • Robust templating configuration
    • Reading template from GGUF (I think they may be saved in there nowadays?)
  • Saving Session with history restore
    • History stored as portable JSON structure
    • Saving inference parameters as portable structure
  • ChatAsync overloads that allow for the system to handle history and Context Overflow Policy and for the user to handle history (exposing the lower level method so to say)

Just of the top of my head. There's probably more that can be added but I'd have to fiddle around and spend some time thinking about what makes sense to expose to the user.

Short-term, I think the change I linked in my fork would greatly improve the utility of the ChatSession (make it usable at all really). Maybe this could be merged soon?

@martindevans
Copy link
Member

Short-term, I think the change I linked in my fork would greatly improve the utility of the ChatSession (make it usable at all really). Maybe this could be merged soon?

If you want to open up a PR with this change I'll be happy to review it :)

Long-term it would be great to have the following...

Context Overflow Policy (see LM Studio)

This one is tricky, because there's are several different "layers" you can do it at. You might want to rebuild a new chunk of text (e.g. keep prompt, summarise the rest). Or you might want to go a level "lower" and just rearrange tokens (e.g. keep prompt tokens and most recent output tokens). Or you might want to do some magic with shifting around the KV cache.

Robust templating configuration

Looks like there's some discussion of embedding jinja templates into gguf (here).

Just of the top of my head. There's probably more that can be added but I'd have to fiddle around and spend some time thinking about what makes sense to expose to the user.

if you come up with anything interesting I opened up a discussion the other day, around thoughts for an entire new executor in LLamaSharp. I'd love to hear your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants