Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

Closed
wazoox opened this issue May 24, 2024 · 8 comments · Fixed by #2383
Closed

llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" #2375

wazoox opened this issue May 24, 2024 · 8 comments · Fixed by #2383
Labels
backend gpt4all-backend issues bug Something isn't working embedding

Comments

@wazoox
Copy link

wazoox commented May 24, 2024

the bug:

Going to settings, Local Docs, pointing to a folder containing a few PDFs; when clicking "Add" GPT4All crashes. It then crashes at startup until I delete the localdocs_v1.db file.

"Local Docs" used to work on this machine with GPT4All 2.7.3.

GPT4All works fine if I reset all settings but don't set up any Local Docs.

If I set up an empty folder as a Local docs, it works; however as soon as I drop a PDF into this folder, GPT4All crashes.

After restart, GPT4All crashes once if there's any new PDF in the Local Docs, then run the second time. However if I ask specific questions related to the Local Docs, it doesn't seem to use them.

configuration:

Running GPT4All 2.8.0 on MacOS Monterey 12.7.5 (Mac Pro Intel 32 GB RAM).

installed local models

Meta-Llama-3-8B-Instruct.Q4_0.gguf
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf
all-MiniLM-L6-v2-f16.gguf
all-MiniLM-L6-v2.gguf2.f16.gguf
mistral-7b-instruct-v0.1.Q4_0.gguf
mistral-7b-openorca.Q4_0.gguf
mistral-7b-openorca.gguf2.Q4_0.gguf

debugging

Unfortunately the first few crashes opened a debugging window with traces, but it doesn't happen anymore for some reason.

@wazoox wazoox added bug-unconfirmed chat gpt4all-chat issues labels May 24, 2024
@chrisbarrera
Copy link
Contributor

I can replicate this problem (as I was testing for someone else's different localdocs crashing problem).
Crashed Thread: 6 embedding

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: Namespace SIGNAL, Code 6 Abort trap: 6

Some stack trace context - let me know if you want the complete one.
Thread 6 Crashed:: embedding
0 libsystem_kernel.dylib 0x19d966a60 __pthread_kill + 8
1 libsystem_pthread.dylib 0x19d99ec20 pthread_kill + 288
2 libsystem_c.dylib 0x19d8aba20 abort + 180
3 libllamamodel-mainline-metal.dylib 0x118d8280c llama_decode.cold.3 + 88
4 libllamamodel-mainline-metal.dylib 0x118cd9e28 llama_decode + 8480
5 libllamamodel-mainline-metal.dylib 0x118c32f50 LLamaModel::embedInternal(std::__1::vector<

@wazoox
Copy link
Author

wazoox commented May 24, 2024

I can replicate this problem (as I was testing for someone else's different localdocs crashing problem). Crashed Thread: 6 embedding

Can I try something to help? I don't know why the crash report windows doesn't open anymore...

@chrisbarrera
Copy link
Contributor

(wazoox) sorry I was directing my comment to the devs. However, I am sure anyway you can help would be appreciated.
(for the devs) continued to look at this, appears to be calling abort in GGML_ASSET in llama_decode_insternal.

Right before it crashed it logged this to file:
[Warning] (Fri May 24 14:06:04 2024): Populating font family aliases took 45 ms. Replace uses of missing font family "MyCustomFont, Sans-serif" with one that exists to avoid this cost.
[Warning] (Fri May 24 14:06:04 2024): ERROR: could not load hnswlib index: Index seems to be corrupted or unsupported
[Warning] (Fri May 24 14:06:04 2024): ERROR: Could not load embeddings
[Debug] (Fri May 24 14:06:04 2024): deserializing chat "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-09da435f-1b9d-46b5-8a80-0a3eaa5b8c14.chat"
[Debug] (Fri May 24 14:06:04 2024): deserializing chat "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-1bf5c61c-f823-4c9a-86c2-8aba984de1c2.chat"
[Warning] (Fri May 24 14:06:04 2024): ERROR: Couldn't deserialize chat from file: "/Users/cb/Library/Application Support/nomic.ai/GPT4All//gpt4all-1bf5c61c-f823-4c9a-86c2-8aba984de1c2.chat"
[Debug] (Fri May 24 14:06:04 2024): deserializing chats took: 0 ms

Not sure the relationship of he above but may be helpful.

@dianamJLAB
Copy link

If this adds any context, also on macOS. Monterey 12.6.9 M1 Max. Running on CPU.
GPT4All v 2.8.0
Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf

When I use localdocs.

If I use a prompt that deliberately would not have similarity to any of the content in the PDFs in the localdocs all works as expected.

If I use a prompt that matches content in the localdocs GPT4All crashes.

Crash Thread 8
Exception Type: EXC_BAD_ACCESS

...
Thread 8 Crashed:: e0cd5225-60e7-462c-b112-eabcf338d216
0   ???                                                  0x0 ???
1   libllamamodel-mainline-cpu.dylib               0x1156b6728 ggml_graph_compute_thread + 896
2   libllamamodel-mainline-cpu.dylib               0x1156b6314 ggml_graph_compute + 248
3   libllamamodel-mainline-cpu.dylib               0x1156e5924 ggml_backend_cpu_graph_compute + 112
4   libllamamodel-mainline-cpu.dylib               0x1156e4a6c ggml_backend_sched_graph_compute_async + 788
5   libllamamodel-mainline-cpu.dylib               0x11572d9d4 llama_decode + 5676
6   libllamamodel-mainline-cpu.dylib               0x11568d7d8 LLamaModel::evalTokens(LLModel::PromptContext&, std::__1::vector<int, std::__1::allocator > const&) const + 264
7   libllamamodel-mainline-cpu.dylib               0x115696bec LLModel::decodePrompt(std::__1::function<bool (int)>, std::__1::function<bool (int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)>, std::__1::function<bool (bool)>, LLModel::PromptContext&, std::__1::vector<int, std::__1::allocator >) + 876
8   libllamamodel-mainline-cpu.dylib               0x1156956f8 LLModel::prompt(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::function<bool (int)>, std::__1::function<bool (int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)>, std::__1::function<bool (bool)>, LLModel::PromptContext&, bool, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >) + 3920
9   gpt4all                                      0x102957b70 ChatLLM::promptInternal(QList const&, QString const&, QString const&, int, int, float, float, float, int, float, int) + 1032
10  gpt4all                                      0x102957010 ChatLLM::prompt(QList const&, QString const&) + 276
11  QtCore                                       0x10634a9e0 QObject::event(QEvent
) + 612
12  QtCore                                       0x106309298 QCoreApplicationPrivate::notify_helper(QObject*, QEvent*) + 384
13  QtCore                                       0x106308e18 QCoreApplication::notifyInternal2(QObject*, QEvent*) + 292
14  QtCore                                       0x10630a0c8 QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) + 1428
15  QtCore                                       0x106474158 QEventDispatcherUNIX::processEvents(QFlagsQEventLoop::ProcessEventsFlag) + 84
16  QtCore                                       0x10631241c QEventLoop::exec(QFlagsQEventLoop::ProcessEventsFlag) + 532
17  QtCore                                       0x1063fbb60 QThread::exec() + 280
18  QtCore                                       0x106478334 0x1062a0000 + 1934132
19  libsystem_pthread.dylib                      0x1aace826c _pthread_start + 148
20  libsystem_pthread.dylib                      0x1aace308c thread_start + 8

@cebtenzzre cebtenzzre added macos bug Something isn't working and removed bug-unconfirmed labels May 28, 2024
@dianamJLAB
Copy link

UPDATE: I installed version 2.7.3 macOS and repeated the steps: same model Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf, same localdocs directory, same SBERT model for localdocs. And GPT4All does NOT crash. Seems to be working as expected. This does appear to be a v 2.8.0 issue. (although I have not yet tried with v 2.7.4)

@cebtenzzre
Copy link
Member

I found an issue with embedInternal that could be related, but since I haven't seen this particular crash I'm not sure that it's the same.

@chrisbarrera Could you run GPT4All from a terminal (/Applications/gpt4all/bin/gpt4all.app/Contents/MacOS/gpt4all) and post the output before the crash? GGML_ASSERT prints an error to stderr that should help narrow down what is going on.


Here are the steps I followed to try and replicate the issue on macOS Sonoma 14.4.1:

  • Move/rename ~/.config/gpt4all.io and ~/Library/Application Support/nomic.ai
  • Install GPT4All v2.8.0 from https://gpt4all.io/ (online installer)
  • Download SBert from the models page
  • Download this pdf to ~/localdocs
  • Go to Settings > LocalDocs, set new collection name to localdocs and the path to /Users/jared/localdocs
  • Click "Add"

After this I also tried:

  • Download Llama 3 8B Instruct
  • DB icon > enable the collection "localdocs"
  • Ask "What is Nomic Embed?"

For me, GPT4All does not crash. What are you doing differently?

@cebtenzzre cebtenzzre added the need-info Further information from issue author is requested label May 28, 2024
@chrisbarrera
Copy link
Contributor

chrisbarrera commented May 28, 2024

The problem went away when I removed localdocs* db under 2.8.0 and recreated the localdocs dbs, but I could replicate it by removing it again and rerunning under 2.7.5 to recreate the DB's, then switching back to 2.8.0.

Here is the output generated as I recreate the crash by attempting to add a new collection to localdocs under 2.8.0:
embedInternal: warning: chunking tokenized text at index 0 into zero tokens
GGML_ASSERT: /Users/atreat/dev/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:11355: (cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens"
zsh: abort ./gpt4all

@cebtenzzre cebtenzzre changed the title GPT4All 2.8.0 client crashes instantly when adding a populated "Local docs" folder llama.cpp assertion fails: "non-causal attention requires n_ubatch >= n_tokens" May 28, 2024
@cebtenzzre cebtenzzre added backend gpt4all-backend issues embedding and removed chat gpt4all-chat issues need-info Further information from issue author is requested macos labels May 28, 2024
@cebtenzzre
Copy link
Member

I can reproduce the assertion failure from the python bindings:

>>> from gpt4all import Embed4All
>>> x = Embed4All('nomic-embed-text-v1.f16.gguf')
>>> x.embed('a ' * 513)
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:11355: (cparams.causal_attn || cparams.n_ubatch >= n_tokens_all) && "non-causal attention requires n_ubatch >= n_tokens

Looks like we are not correctly setting n_ubatch after the llama.cpp update from the CUDA PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend gpt4all-backend issues bug Something isn't working embedding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants