llamamodel: fix embedding crash for >512 tokens after #2310 #2383

cebtenzzre · 2024-05-28T18:59:09Z

n_ubatch defaults to 512, but as of the latest llama.cpp you cannot pass more than n_ubatch tokens to BERT without hitting an assertion failure.

Tested with this python code:

from gpt4all import Embed4All
m = Embed4All('nomic-embed-text-v1.f16.gguf')
e = m.embed('a ' * 513)

Without this PR, it crashes with an assertion failure (including in release builds, since it's a GGML_ASSERT). With this PR, it succeeds.

Broken by #2310 because of ggerganov/llama.cpp#6017
Fix based on ggerganov/llama.cpp#6296
Fixes #2375

n_ubatch defaults to 512, but as of the latest llama.cpp you cannot pass more than n_ubatch tokens to the embedding model without hitting an assertion failure. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llamamodel: fix embedding crash for >512 tokens after #2310

9681f67

n_ubatch defaults to 512, but as of the latest llama.cpp you cannot pass more than n_ubatch tokens to the embedding model without hitting an assertion failure. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso May 28, 2024 18:59

manyoso approved these changes May 29, 2024

View reviewed changes

cebtenzzre merged commit e94177e into main May 29, 2024
6 of 19 checks passed

ellipsis-dev bot mentioned this pull request Jul 2, 2024

release.json: update release notes for v3.0.0 #2514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamamodel: fix embedding crash for >512 tokens after #2310 #2383

llamamodel: fix embedding crash for >512 tokens after #2310 #2383

cebtenzzre commented May 28, 2024

llamamodel: fix embedding crash for >512 tokens after #2310 #2383

llamamodel: fix embedding crash for >512 tokens after #2310 #2383

Conversation

cebtenzzre commented May 28, 2024