llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

wazoox · 2024-05-24T17:41:44Z

the bug:

Going to settings, Local Docs, pointing to a folder containing a few PDFs; when clicking "Add" GPT4All crashes. It then crashes at startup until I delete the localdocs_v1.db file.

"Local Docs" used to work on this machine with GPT4All 2.7.3.

GPT4All works fine if I reset all settings but don't set up any Local Docs.

If I set up an empty folder as a Local docs, it works; however as soon as I drop a PDF into this folder, GPT4All crashes.

After restart, GPT4All crashes once if there's any new PDF in the Local Docs, then run the second time. However if I ask specific questions related to the Local Docs, it doesn't seem to use them.

configuration:

Running GPT4All 2.8.0 on MacOS Monterey 12.7.5 (Mac Pro Intel 32 GB RAM).

installed local models

Meta-Llama-3-8B-Instruct.Q4_0.gguf
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf
all-MiniLM-L6-v2-f16.gguf
all-MiniLM-L6-v2.gguf2.f16.gguf
mistral-7b-instruct-v0.1.Q4_0.gguf
mistral-7b-openorca.Q4_0.gguf
mistral-7b-openorca.gguf2.Q4_0.gguf

debugging

Unfortunately the first few crashes opened a debugging window with traces, but it doesn't happen anymore for some reason.

chrisbarrera · 2024-05-24T19:05:08Z

I can replicate this problem (as I was testing for someone else's different localdocs crashing problem).
Crashed Thread: 6 embedding

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: Namespace SIGNAL, Code 6 Abort trap: 6

Some stack trace context - let me know if you want the complete one.
Thread 6 Crashed:: embedding
0 libsystem_kernel.dylib 0x19d966a60 __pthread_kill + 8
1 libsystem_pthread.dylib 0x19d99ec20 pthread_kill + 288
2 libsystem_c.dylib 0x19d8aba20 abort + 180
3 libllamamodel-mainline-metal.dylib 0x118d8280c llama_decode.cold.3 + 88
4 libllamamodel-mainline-metal.dylib 0x118cd9e28 llama_decode + 8480
5 libllamamodel-mainline-metal.dylib 0x118c32f50 LLamaModel::embedInternal(std::__1::vector<

wazoox · 2024-05-24T20:18:03Z

I can replicate this problem (as I was testing for someone else's different localdocs crashing problem). Crashed Thread: 6 embedding

Can I try something to help? I don't know why the crash report windows doesn't open anymore...

chrisbarrera · 2024-05-25T13:40:58Z

(wazoox) sorry I was directing my comment to the devs. However, I am sure anyway you can help would be appreciated.
(for the devs) continued to look at this, appears to be calling abort in GGML_ASSET in llama_decode_insternal.

Right before it crashed it logged this to file:
[Warning] (Fri May 24 14:06:04 2024): Populating font family aliases took 45 ms. Replace uses of missing font family "MyCustomFont, Sans-serif" with one that exists to avoid this cost.
[Warning] (Fri May 24 14:06:04 2024): ERROR: could not load hnswlib index: Index seems to be corrupted or unsupported
[Warning] (Fri May 24 14:06:04 2024): ERROR: Could not load embeddings
[Debug] (Fri May 24 14:06:04 2024): deserializing chat "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-09da435f-1b9d-46b5-8a80-0a3eaa5b8c14.chat"
[Debug] (Fri May 24 14:06:04 2024): deserializing chat "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-1bf5c61c-f823-4c9a-86c2-8aba984de1c2.chat"
[Warning] (Fri May 24 14:06:04 2024): ERROR: Couldn't deserialize chat from file: "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-1bf5c61c-f823-4c9a-86c2-8aba984de1c2.chat"
[Debug] (Fri May 24 14:06:04 2024): deserializing chats took: 0 ms

Not sure the relationship of he above but may be helpful.

dianamJLAB · 2024-05-27T18:11:43Z

If this adds any context, also on macOS. Monterey 12.6.9 M1 Max. Running on CPU.
GPT4All v 2.8.0
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf

When I use localdocs.

If I use a prompt that deliberately would not have similarity to any of the content in the PDFs in the localdocs all works as expected.

If I use a prompt that matches content in the localdocs GPT4All crashes.

Crash Thread 8
Exception Type: EXC_BAD_ACCESS

...
Thread 8 Crashed:: e0cd5225-60e7-462c-b112-eabcf338d216
0   ???                                                  0x0 ???
1   libllamamodel-mainline-cpu.dylib               0x1156b6728 ggml_graph_compute_thread + 896
2   libllamamodel-mainline-cpu.dylib               0x1156b6314 ggml_graph_compute + 248
3   libllamamodel-mainline-cpu.dylib               0x1156e5924 ggml_backend_cpu_graph_compute + 112
4   libllamamodel-mainline-cpu.dylib               0x1156e4a6c ggml_backend_sched_graph_compute_async + 788
5   libllamamodel-mainline-cpu.dylib               0x11572d9d4 llama_decode + 5676
6   libllamamodel-mainline-cpu.dylib               0x11568d7d8 LLamaModel::evalTokens(LLModel::PromptContext&, std::__1::vector<int, std::__1::allocator > const&) const + 264
7   libllamamodel-mainline-cpu.dylib               0x115696bec LLModel::decodePrompt(std::__1::function<bool (int)>, std::__1::function<bool (int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)>, std::__1::function<bool (bool)>, LLModel::PromptContext&, std::__1::vector<int, std::__1::allocator >) + 876
8   libllamamodel-mainline-cpu.dylib               0x1156956f8 LLModel::prompt(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::function<bool (int)>, std::__1::function<bool (int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)>, std::__1::function<bool (bool)>, LLModel::PromptContext&, bool, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >) + 3920
9   gpt4all                                      0x102957b70 ChatLLM::promptInternal(QList const&, QString const&, QString const&, int, int, float, float, float, int, float, int) + 1032
10 gpt4all                                      0x102957010 ChatLLM::prompt(QList const&, QString const&) + 276
11 QtCore                                       0x10634a9e0 QObject::event(QEvent) + 612
12 QtCore                                       0x106309298 QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) + 384
13 QtCore                                       0x106308e18 QCoreApplication::notifyInternal2(QObject*, QEvent*) + 292
14 QtCore                                       0x10630a0c8 QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) + 1428
15 QtCore                                       0x106474158 QEventDispatcherUNIX::processEvents(QFlagsQEventLoop::ProcessEventsFlag) + 84
16 QtCore                                       0x10631241c QEventLoop::exec(QFlagsQEventLoop::ProcessEventsFlag) + 532
17 QtCore                                       0x1063fbb60 QThread::exec() + 280
18 QtCore                                       0x106478334 0x1062a0000 + 1934132
19 libsystem_pthread.dylib                      0x1aace826c _pthread_start + 148
20 libsystem_pthread.dylib                      0x1aace308c thread_start + 8

dianamJLAB · 2024-05-28T14:46:56Z

UPDATE: I installed version 2.7.3 macOS and repeated the steps: same model Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf, same localdocs directory, same SBERT model for localdocs. And GPT4All does NOT crash. Seems to be working as expected. This does appear to be a v 2.8.0 issue. (although I have not yet tried with v 2.7.4)

cebtenzzre · 2024-05-28T16:03:17Z

I found an issue with embedInternal that could be related, but since I haven't seen this particular crash I'm not sure that it's the same.

@chrisbarrera Could you run GPT4All from a terminal (/Applications/gpt4all/bin/gpt4all.app/Contents/MacOS/gpt4all) and post the output before the crash? GGML_ASSERT prints an error to stderr that should help narrow down what is going on.

Here are the steps I followed to try and replicate the issue on macOS Sonoma 14.4.1:

Move/rename ~/.config/gpt4all.io and ~/Library/Application Support/nomic.ai
Install GPT4All v2.8.0 from https://gpt4all.io/ (online installer)
Download SBert from the models page
Download this pdf to ~/localdocs
Go to Settings > LocalDocs, set new collection name to localdocs and the path to /Users/jared/localdocs
Click "Add"

After this I also tried:

Download Llama 3 8B Instruct
DB icon > enable the collection "localdocs"
Ask "What is Nomic Embed?"

For me, GPT4All does not crash. What are you doing differently?

chrisbarrera · 2024-05-28T16:35:06Z

The problem went away when I removed localdocs* db under 2.8.0 and recreated the localdocs dbs, but I could replicate it by removing it again and rerunning under 2.7.5 to recreate the DB's, then switching back to 2.8.0.

Here is the output generated as I recreate the crash by attempting to add a new collection to localdocs under 2.8.0:
embedInternal: warning: chunking tokenized text at index 0 into zero tokens
GGML_ASSERT: /Users/atreat/dev/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:11355: (cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens"
zsh: abort ./gpt4all

cebtenzzre · 2024-05-28T18:41:15Z

I can reproduce the assertion failure from the python bindings:

>>> from gpt4all import Embed4All
>>> x = Embed4All('nomic-embed-text-v1.f16.gguf')
>>> x.embed('a ' * 513)
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:11355: (cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens

Looks like we are not correctly setting n_ubatch after the llama.cpp update from the CUDA PR.

wazoox added bug-unconfirmed chat gpt4all-chat issues labels May 24, 2024

cebtenzzre added macos bug Something isn't working and removed bug-unconfirmed labels May 28, 2024

cebtenzzre added the need-info Further information from issue author is requested label May 28, 2024

cebtenzzre changed the title ~~GPT4All 2.8.0 client crashes instantly when adding a populated "Local docs" folder~~ llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" May 28, 2024

cebtenzzre added backend gpt4all-backend issues embedding and removed chat gpt4all-chat issues need-info Further information from issue author is requested macos labels May 28, 2024

cebtenzzre mentioned this issue May 28, 2024

llamamodel: fix embedding crash for >512 tokens after #2310 #2383

Merged

cebtenzzre closed this as completed in #2383 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

wazoox commented May 24, 2024

chrisbarrera commented May 24, 2024

wazoox commented May 24, 2024

chrisbarrera commented May 25, 2024

dianamJLAB commented May 27, 2024

dianamJLAB commented May 28, 2024

cebtenzzre commented May 28, 2024

chrisbarrera commented May 28, 2024 •

edited

Loading

cebtenzzre commented May 28, 2024

llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

Comments

wazoox commented May 24, 2024

the bug:

configuration:

installed local models

debugging

chrisbarrera commented May 24, 2024

wazoox commented May 24, 2024

chrisbarrera commented May 25, 2024

dianamJLAB commented May 27, 2024

dianamJLAB commented May 28, 2024

cebtenzzre commented May 28, 2024

chrisbarrera commented May 28, 2024 • edited Loading

cebtenzzre commented May 28, 2024

chrisbarrera commented May 28, 2024 •

edited

Loading