Frequent errors with webgui llm answers when json decoding fails. #177

oderwat · 2023-10-29T11:34:27Z

I found that most longer answers from the webgui backend will fail to JSON decode. From the data I get to see, it looks like as if the LLM returned an incomplete message. I wonder if one could make the LLM "continue" the message when decoding of it fails. Maybe also try to repair a JSON answer if it is "just missing" the proper ending? I also wonder if one could give the backend a grammar or maximum new token length or how to set any parameters. Is that done inside webgui in that case?

When using "--model dolphin-2.1-mistral-7b" with the same model loaded into webgui I also get this kind of malformed JSON when trying to store into memory > Enter your message: my name is Horst and I am 42 years old. save that in your memory.

Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}

The text was updated successfully, but these errors were encountered:

Phiapsi · 2023-10-29T21:41:11Z

Same with LMStudio and also the dolphin-2.1-mistral-7b modell

oderwat · 2023-10-29T22:18:34Z

To answer my own question partially:

I found what gives the parameters for the query. So I tried to change memgpt/local_llm/webui/settings.py and added "max_new_tokens": 1000, at the end in a line before the } and this decreased the problem with cut off JSON replies quite a bit.

I also tried adding "ban_eos_token": True but removed it again, as my experience from webui actually is that it may answer with garbage.

Maybe adding a grammar_string could be pretty interesting, but I am not very sophisticated with that, I just played with grammars a bit in Faraday and I found it incredible hard to make it work in any meaningful way. But maybe somebody knows more about the needed grammar and how to define it.

Besides possibly fixing some of the JSON parsing problems one may also want to change some of the parameters to shape the models answers.

P.S.: I also think the program should just tell you that it failed and not crash when JSON parsing fails or no answer is generated.

cpacker · 2023-10-30T05:19:15Z

@oderwat @Phiapsi

Just to clarify what's going on here (just in case it's not clear, sorry if I'm explaining what you already know):

Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}

The raw output string from the LLM is two JSON objects, whereas it should just be one:

{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}

We could potentially get around this by running a JSON parser that checks for matching open/close braces ({ / }), and truncates the string when it finds the match (that will cause the first JSON object only to be extracted).

Also, just looking at this output it seems like there's another error (that's semantic) - the bot should have added request_heartbeat: true as a parameter to the core_memory_append function. A more MemGPT-specific parsing hack to this would be to check if core_memory_X is run + the bot added another call after (indicating it's trying to chain functions), in which case we can manually add request_heartbeat: true to the first call.

tl;dr: there are lots of MemGPT specific parsing hacks we can use here to make MemGPT perform better

vivi · 2023-10-30T09:22:33Z

Another JSON decoding error from bebo on Discord: https://pastebin.com/nJeAxHvZ

Rivelyn · 2023-10-31T14:33:47Z

I am using LM Studio and have tried every model under the sun and always get Exception: Failed to decode JSON from LLM output every time I try to run MemGPT. I have no problems having a chat in LM Studio with any of the models. Currently I am testing TheBloke - zephyr beta 7B q5_k_m gguf. It seems to be a very capable model when chatting in LM Studio. But once again as soon as I start my LM Studio server and run through the setups using Anaconda Prompt set all of my variables with
set OPENAI_API_BASE=http://localhost:1234
set BACKEND_TYPE=lmstudio
python main.py --no_verify

It thinks for several minutes then always returns the same results with Exception: Failed to decode JSON from LLM output.

I can include all of the information from my LM Studio and from MemGPT but I am not sure exactly what is needed to asses the issue.

I am running a windows laptop, Aspire 5, not great for this but just need a bit more patients waiting for the responses, not a problem for me.

cpacker · 2023-11-01T08:31:31Z

@Rivelyn were you able to get this working?

This seems like it might be a model error but it's really hard to check without matching model settings we've tested exactly. If you're on discord ping me and I'll help you get set up.

If you're not on discord, can you try running a dolphin-2.1 model instead and report back?

Eg try to make your LM Studio look like this exactly (maybe a lower quantization if the model doesn't fit on your computer):

Rivelyn · 2023-11-01T11:28:28Z

@cpacker I pinged you and @vivi on discord with what progress(sort of), maybe not progress but at least something different, I made this morning.

Rivelyn · 2023-11-01T12:01:55Z

For someone that might be reading this here and not on discord I will update what is happening so far. As of yet I have not changed to dolphin-2.1. I can, I have been impressed with the Zephry model so far so I was hoping to get it operational.

LM Studio updated version from yesterday 0.2.8
Model: TheBloke/zephyr-7B-beta-GGUF/zephyr-7b-beta.Q5_K_M.gguf.
Server model setting 'zephyr'.
MemGPT commands
set OPENAI_API_BASE=http://localhost:1234
set BACKEND_TYPE=lmstudio
python -m memgpt --model zephyr-7B-beta --persona sam

Thinking ran for a while then either LM Studio or MemGPT went into a loop on startup and finally gave me this error: Exception: Hit first message retry limit (10)

My environment is Windows 11 on an Aspire 5 16GB laptop. I have no GPU, and the chat with Zephyr model is good, just for anyone wondering. I actually tried the larger version of Zephyr and didn't really notice any difference for my testing. Using Anaconda I created my environment and cloned the repo, then set my commands as above.

Drake-AI · 2023-11-02T23:15:13Z

@oderwat @Phiapsi

Just to clarify what's going on here (just in case it's not clear, sorry if I'm explaining what you already know):
Exception: Failed to decode JSON from LLM output:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
{
  "function": "send_message",
  "params": {
    "message": "Got it! Your age and nationality are now saved in my memory."
  }
}
The raw output string from the LLM is two JSON objects, whereas it should just be one:
{
  "function": "core_memory_append",
  "params": {
    "name": "human",
    "content": "Horst, 42 years old, from Germany."
  }
}
We could potentially get around this by running a JSON parser that checks for matching open/close braces ({ / }), and truncates the string when it finds the match (that will cause the first JSON object only to be extracted).

Also, just looking at this output it seems like there's another error (that's semantic) - the bot should have added request_heartbeat: true as a parameter to the core_memory_append function. A more MemGPT-specific parsing hack to this would be to check if core_memory_X is run + the bot added another call after (indicating it's trying to chain functions), in which case we can manually add request_heartbeat: true to the first call.

tl;dr: there are lots of MemGPT specific parsing hacks we can use here to make MemGPT perform better

What about a stop sequence like "\n }\n}" ? It works for me in my tests with dolphin and openchat3.5 using koboldcpp.

cpacker · 2023-11-03T07:20:39Z

Hey @oderwat @Rivelyn @Drake-AI

We just added some extra JSON parsing code in #269, which hopefully should fix many of the common issues like double-JSON and run-on JSON (eg b/c of missed stop tokens).

In the PR top comment you can see an example of how the new parsing hacks prevent common bad JSON outputs that people have been experiencing. Hopes this helps!

Closing this particular issue because it's about double-JSON which should be fixed now, but feel free to re-open or continue commenting.

oderwat · 2023-11-03T12:12:43Z

FYI: The original issue was not about the double JSON, but about premature ending of the JSON data. Which was fixed for me by #187 / #182

The second problem with Dolphin did not happen for me, if I was using the default wrapper instead.

TY

Drake-AI · 2023-11-03T12:49:24Z

Looks like the new PR solves a few things, good job. For dolphin i am using a modified wrapper, just like airoboros but changing some things and it works well.

oderwat · 2023-11-03T12:51:35Z

@Drake-AI I guess you do not use the original dolphin wrapper either. I am on legacy code and don't plan to change to the current main, because it seems still to break my models. They just act very different, and I have no fun in debugging that.

Drake-AI · 2023-11-03T12:58:21Z

@oderwat That's weird, the model depends on prompt and parameters sent to webui, should not change if the prompt is the same. I tried both and never noticed that, but i use koboldcpp as endpoint, it is pretty much like webui but just for GGUF models.

For dolphin I use a modified wrapper based on the airoboros wrapper, works well with inner thoughs even and with the new version of dolphin 2.2.1. But now i'm trying openchat3.5 and so far is better than dolphin, but needs a custom wrapper too.

oderwat · 2023-11-03T13:05:19Z

@Drake-AI I am not sure what is happening with the new code. It ignored all persona information. It also seemed that the inner dialog are not working the same way. I got a lot of replies that had basically a very good answer as inner dialog, but the actual message was a bad version of that. But that may already be fixed or was user error. I just don't have the time to check it out again. My PR's for "retry / rethink / rewrite / dump message" functionalities are also just "stuck". My timezone PR got a "do it yourself, we don't care". I have more important stuff to do.

Drake-AI · 2023-11-03T13:11:04Z

@oderwat May be fixed, try again when you have time. I tried your retry command and works well, thanks. For timezone i modified it and just use datetime.now(), so it takes the time from the system.

oderwat changed the title ~~Frequent errors with textgen llm answers when json decoding fails.~~ Frequent errors with webgui llm answers when json decoding fails. Oct 29, 2023

cpacker mentioned this issue Nov 3, 2023

Improvements to JSON handling for local LLMs #269

Merged

3 tasks

cpacker closed this as completed Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequent errors with webgui llm answers when json decoding fails. #177

Frequent errors with webgui llm answers when json decoding fails. #177

oderwat commented Oct 29, 2023 •

edited

Loading

Phiapsi commented Oct 29, 2023

oderwat commented Oct 29, 2023

cpacker commented Oct 30, 2023

vivi commented Oct 30, 2023

Rivelyn commented Oct 31, 2023

cpacker commented Nov 1, 2023

Rivelyn commented Nov 1, 2023

Rivelyn commented Nov 1, 2023

Drake-AI commented Nov 2, 2023

cpacker commented Nov 3, 2023

oderwat commented Nov 3, 2023

Drake-AI commented Nov 3, 2023

oderwat commented Nov 3, 2023

Drake-AI commented Nov 3, 2023

oderwat commented Nov 3, 2023

Drake-AI commented Nov 3, 2023

Frequent errors with webgui llm answers when json decoding fails. #177

Frequent errors with webgui llm answers when json decoding fails. #177

Comments

oderwat commented Oct 29, 2023 • edited Loading

Phiapsi commented Oct 29, 2023

oderwat commented Oct 29, 2023

cpacker commented Oct 30, 2023

vivi commented Oct 30, 2023

Rivelyn commented Oct 31, 2023

cpacker commented Nov 1, 2023

Rivelyn commented Nov 1, 2023

Rivelyn commented Nov 1, 2023

Drake-AI commented Nov 2, 2023

cpacker commented Nov 3, 2023

oderwat commented Nov 3, 2023

Drake-AI commented Nov 3, 2023

oderwat commented Nov 3, 2023

Drake-AI commented Nov 3, 2023

oderwat commented Nov 3, 2023

Drake-AI commented Nov 3, 2023

oderwat commented Oct 29, 2023 •

edited

Loading