-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: llama-server crash with --embeddings
#9978
Comments
This is likely due to an overflow in the positions embeddings when exceeding the maximum sequence length supported by the model. Limiting the context size to the max sequence length supported by the model (which in this case seems to be 1024) should avoid the crash. |
@slaren Thanks! Earlier versions did not have this problem. It appeared recently. |
The check was added in #9354. Earlier versions didn't crash, but the results were not correct either. |
@slaren I set context size to 1024, it still crash. Can it be changed to just a warning? Stay the same as before? The program crashes and the service becomes unavailable. |
The context size might actually be 512 for this model. See the https://huggingface.co/BAAI/bge-large-zh-v1.5/blob/main/config.json#L23 |
@ggerganov Okay, I'll try it later. BTW: Is it possible to prevent the program from crashing because the input parameter text is too long? |
It will not crash if you set the correct context size. |
It doesn't seem to crash again. I'll continue to observe for a while. @ggerganov If context size exceeds the allowed value, can it be automatically set to the max_position_embeddings and then output a warning to avoid a crash? |
No, in some cases it can make sense to set large The default value of |
@ggerganov Hello, After a day of testing, it still crashed with the same error as before. |
Should be fixed in #10030 |
@ggerganov Thanks, I guess I won't have time to test until next week. I rolled back the version to early July and it didn't crash anymore. So it's not #9354 's problem. |
Note that ignoring the crash will most definitely lead the model to produce garbage. The problem is not the bounds check, the problem is the overflow. |
What happened?
After starting with the following command, it will occasionally crash suddenly while running.
llama-server -m ./bge-large-zh-v1.5 --port 3358 -a emb@bge-large-zh-v1.5 -ngl 100 -c 8192 --samplers tempera
ture;top_p --embeddings -ub 8192 --pooling cls
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 1: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 2: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 3: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
version: 3945 (45f0976)
built with cc (Debian 10.2.1-6) 10.2.1 20210110 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
The text was updated successfully, but these errors were encountered: