Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent errors with webgui llm answers when json decoding fails. #177

Closed
oderwat opened this issue Oct 29, 2023 · 16 comments
Closed

Frequent errors with webgui llm answers when json decoding fails. #177

oderwat opened this issue Oct 29, 2023 · 16 comments

Comments

@oderwat
Copy link
Contributor

oderwat commented Oct 29, 2023

I found that most longer answers from the webgui backend will fail to JSON decode. From the data I get to see, it looks like as if the LLM returned an incomplete message. I wonder if one could make the LLM "continue" the message when decoding of it fails. Maybe also try to repair a JSON answer if it is "just missing" the proper ending? I also wonder if one could give the backend a grammar or maximum new token length or how to set any parameters. Is that done inside webgui in that case?

When using "--model dolphin-2.1-mistral-7b" with the same model loaded into webgui I also get this kind of malformed JSON when trying to store into memory > Enter your message: my name is Horst and I am 42 years old. save that in your memory.

Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}
@oderwat oderwat changed the title Frequent errors with textgen llm answers when json decoding fails. Frequent errors with webgui llm answers when json decoding fails. Oct 29, 2023
@Phiapsi
Copy link

Phiapsi commented Oct 29, 2023

Same with LMStudio and also the dolphin-2.1-mistral-7b modell

@oderwat
Copy link
Contributor Author

oderwat commented Oct 29, 2023

To answer my own question partially:

I found what gives the parameters for the query. So I tried to change memgpt/local_llm/webui/settings.py and added "max_new_tokens": 1000, at the end in a line before the } and this decreased the problem with cut off JSON replies quite a bit.

I also tried adding "ban_eos_token": True but removed it again, as my experience from webui actually is that it may answer with garbage.

Maybe adding a grammar_string could be pretty interesting, but I am not very sophisticated with that, I just played with grammars a bit in Faraday and I found it incredible hard to make it work in any meaningful way. But maybe somebody knows more about the needed grammar and how to define it.

Besides possibly fixing some of the JSON parsing problems one may also want to change some of the parameters to shape the models answers.

P.S.: I also think the program should just tell you that it failed and not crash when JSON parsing fails or no answer is generated.

@cpacker
Copy link
Collaborator

cpacker commented Oct 30, 2023

@oderwat @Phiapsi

Just to clarify what's going on here (just in case it's not clear, sorry if I'm explaining what you already know):

Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}

The raw output string from the LLM is two JSON objects, whereas it should just be one:

{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}

We could potentially get around this by running a JSON parser that checks for matching open/close braces ({ / }), and truncates the string when it finds the match (that will cause the first JSON object only to be extracted).

Also, just looking at this output it seems like there's another error (that's semantic) - the bot should have added request_heartbeat: true as a parameter to the core_memory_append function. A more MemGPT-specific parsing hack to this would be to check if core_memory_X is run + the bot added another call after (indicating it's trying to chain functions), in which case we can manually add request_heartbeat: true to the first call.

tl;dr: there are lots of MemGPT specific parsing hacks we can use here to make MemGPT perform better

@vivi
Copy link
Contributor

vivi commented Oct 30, 2023

Another JSON decoding error from bebo on Discord: https://pastebin.com/nJeAxHvZ

@Rivelyn
Copy link

Rivelyn commented Oct 31, 2023

I am using LM Studio and have tried every model under the sun and always get Exception: Failed to decode JSON from LLM output every time I try to run MemGPT. I have no problems having a chat in LM Studio with any of the models. Currently I am testing TheBloke - zephyr beta 7B q5_k_m gguf. It seems to be a very capable model when chatting in LM Studio. But once again as soon as I start my LM Studio server and run through the setups using Anaconda Prompt set all of my variables with
set OPENAI_API_BASE=http://localhost:1234
set BACKEND_TYPE=lmstudio
python main.py --no_verify

It thinks for several minutes then always returns the same results with Exception: Failed to decode JSON from LLM output.

I can include all of the information from my LM Studio and from MemGPT but I am not sure exactly what is needed to asses the issue.

I am running a windows laptop, Aspire 5, not great for this but just need a bit more patients waiting for the responses, not a problem for me.

@cpacker
Copy link
Collaborator

cpacker commented Nov 1, 2023

@Rivelyn were you able to get this working?

This seems like it might be a model error but it's really hard to check without matching model settings we've tested exactly. If you're on discord ping me and I'll help you get set up.

If you're not on discord, can you try running a dolphin-2.1 model instead and report back?

Eg try to make your LM Studio look like this exactly (maybe a lower quantization if the model doesn't fit on your computer):
image

@Rivelyn
Copy link

Rivelyn commented Nov 1, 2023

@cpacker I pinged you and @vivi on discord with what progress(sort of), maybe not progress but at least something different, I made this morning.

@Rivelyn
Copy link

Rivelyn commented Nov 1, 2023

For someone that might be reading this here and not on discord I will update what is happening so far. As of yet I have not changed to dolphin-2.1. I can, I have been impressed with the Zephry model so far so I was hoping to get it operational.

LM Studio updated version from yesterday 0.2.8
Model: TheBloke/zephyr-7B-beta-GGUF/zephyr-7b-beta.Q5_K_M.gguf.
Server model setting 'zephyr'.
MemGPT commands
set OPENAI_API_BASE=http://localhost:1234
set BACKEND_TYPE=lmstudio
python -m memgpt --model zephyr-7B-beta --persona sam

Thinking ran for a while then either LM Studio or MemGPT went into a loop on startup and finally gave me this error: Exception: Hit first message retry limit (10)

My environment is Windows 11 on an Aspire 5 16GB laptop. I have no GPU, and the chat with Zephyr model is good, just for anyone wondering. I actually tried the larger version of Zephyr and didn't really notice any difference for my testing. Using Anaconda I created my environment and cloned the repo, then set my commands as above.

@Drake-AI
Copy link
Contributor

Drake-AI commented Nov 2, 2023

@oderwat @Phiapsi

Just to clarify what's going on here (just in case it's not clear, sorry if I'm explaining what you already know):

Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}

The raw output string from the LLM is two JSON objects, whereas it should just be one:

{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}

We could potentially get around this by running a JSON parser that checks for matching open/close braces ({ / }), and truncates the string when it finds the match (that will cause the first JSON object only to be extracted).

Also, just looking at this output it seems like there's another error (that's semantic) - the bot should have added request_heartbeat: true as a parameter to the core_memory_append function. A more MemGPT-specific parsing hack to this would be to check if core_memory_X is run + the bot added another call after (indicating it's trying to chain functions), in which case we can manually add request_heartbeat: true to the first call.

tl;dr: there are lots of MemGPT specific parsing hacks we can use here to make MemGPT perform better

What about a stop sequence like "\n }\n}" ? It works for me in my tests with dolphin and openchat3.5 using koboldcpp.

@cpacker
Copy link
Collaborator

cpacker commented Nov 3, 2023

Hey @oderwat @Rivelyn @Drake-AI

We just added some extra JSON parsing code in #269, which hopefully should fix many of the common issues like double-JSON and run-on JSON (eg b/c of missed stop tokens).

In the PR top comment you can see an example of how the new parsing hacks prevent common bad JSON outputs that people have been experiencing. Hopes this helps!

Closing this particular issue because it's about double-JSON which should be fixed now, but feel free to re-open or continue commenting.

@cpacker cpacker closed this as completed Nov 3, 2023
@oderwat
Copy link
Contributor Author

oderwat commented Nov 3, 2023

FYI: The original issue was not about the double JSON, but about premature ending of the JSON data. Which was fixed for me by #187 / #182

The second problem with Dolphin did not happen for me, if I was using the default wrapper instead.

TY

@Drake-AI
Copy link
Contributor

Drake-AI commented Nov 3, 2023

Looks like the new PR solves a few things, good job. For dolphin i am using a modified wrapper, just like airoboros but changing some things and it works well.

@oderwat
Copy link
Contributor Author

oderwat commented Nov 3, 2023

@Drake-AI I guess you do not use the original dolphin wrapper either. I am on legacy code and don't plan to change to the current main, because it seems still to break my models. They just act very different, and I have no fun in debugging that.

@Drake-AI
Copy link
Contributor

Drake-AI commented Nov 3, 2023

@oderwat That's weird, the model depends on prompt and parameters sent to webui, should not change if the prompt is the same. I tried both and never noticed that, but i use koboldcpp as endpoint, it is pretty much like webui but just for GGUF models.

For dolphin I use a modified wrapper based on the airoboros wrapper, works well with inner thoughs even and with the new version of dolphin 2.2.1. But now i'm trying openchat3.5 and so far is better than dolphin, but needs a custom wrapper too.

@oderwat
Copy link
Contributor Author

oderwat commented Nov 3, 2023

@Drake-AI I am not sure what is happening with the new code. It ignored all persona information. It also seemed that the inner dialog are not working the same way. I got a lot of replies that had basically a very good answer as inner dialog, but the actual message was a bad version of that. But that may already be fixed or was user error. I just don't have the time to check it out again. My PR's for "retry / rethink / rewrite / dump message" functionalities are also just "stuck". My timezone PR got a "do it yourself, we don't care". I have more important stuff to do.

@Drake-AI
Copy link
Contributor

Drake-AI commented Nov 3, 2023

@oderwat May be fixed, try again when you have time. I tried your retry command and works well, thanks. For timezone i modified it and just use datetime.now(), so it takes the time from the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants