Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: llama-server crash with --embeddings #9978

Closed
mokeyish opened this issue Oct 21, 2024 · 13 comments · Fixed by #10030
Closed

Bug: llama-server crash with --embeddings #9978

mokeyish opened this issue Oct 21, 2024 · 13 comments · Fixed by #10030
Labels
bug Something isn't working critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

Comments

@mokeyish
Copy link

mokeyish commented Oct 21, 2024

What happened?

After starting with the following command, it will occasionally crash suddenly while running.

llama-server -m ./bge-large-zh-v1.5 --port 3358 -a emb@bge-large-zh-v1.5 -ngl 100 -c 8192 --samplers tempera
ture;top_p --embeddings -ub 8192 --pooling cls

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 1: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 2: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 3: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
version: 3945 (45f0976)
built with cc (Debian 10.2.1-6) 10.2.1 20210110 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

[Thread debugging using libthread_db enabled]                                                                                         
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".                                                            
0x00007f210ada7787 in __GI___wait4 (pid=3074567, stat_loc=0x7fffc0b6d3a4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:
27                                                                                                                                    
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.                                                                
#0  0x00007f210ada7787 in __GI___wait4 (pid=3074567, stat_loc=0x7fffc0b6d3a4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait
4.c:27                                                                                                                                
27      in ../sysdeps/unix/sysv/linux/wait4.c                                                                                         
#1  0x00007f210b21b638 in ggml_abort () from /home/user/llama.cpp/build/ggml/src/libggml.so                                       
#2  0x00007f210b21f700 in ggml_compute_forward_get_rows () from //home/user/llama.cpp/build/ggml/src/libggml.so                    
#3  0x00007f210b24d0a2 in ggml_graph_compute_thread.isra () from /home/user/llama.cpp/build/ggml/src/libggml.so                   
#4  0x00007f210b250cf6 in ggml_graph_compute () from /home/user/llama.cpp/build/ggml/src/libggml.so                               
#5  0x00007f210b25caf3 in ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) () from /home/user/llama.cpp/build/ggml/src/
libggml.so                                                                                                                            
#6  0x00007f210b261d75 in ggml_backend_sched_graph_compute_async () from /home/user/llama.cpp/build/ggml/src/libggml.so           
#7  0x00007f2120eb15c2 in llama_decode () from /home/user/llama.cpp/build/src/libllama.so                                         
#8  0x000055776938aa04 in server_context::update_slots() ()                                                                           
#9  0x000055776936d7e1 in server_queue::start_loop() ()                                                                               
#10 0x0000557769324981 in main ()                                                                                                     
[Inferior 1 (process 2694414) detached]
@mokeyish mokeyish added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Oct 21, 2024
@slaren slaren added bug Something isn't working and removed bug-unconfirmed labels Oct 21, 2024
@slaren
Copy link
Collaborator

slaren commented Oct 21, 2024

This is likely due to an overflow in the positions embeddings when exceeding the maximum sequence length supported by the model. Limiting the context size to the max sequence length supported by the model (which in this case seems to be 1024) should avoid the crash.

@mokeyish
Copy link
Author

@slaren Thanks! Earlier versions did not have this problem. It appeared recently.

@slaren
Copy link
Collaborator

slaren commented Oct 22, 2024

The check was added in #9354. Earlier versions didn't crash, but the results were not correct either.

@mokeyish
Copy link
Author

@slaren I set context size to 1024, it still crash. Can it be changed to just a warning? Stay the same as before? The program crashes and the service becomes unavailable.

@ggerganov
Copy link
Owner

Limiting the context size to the max sequence length supported by the model (which in this case seems to be 1024) should avoid the crash.

The context size might actually be 512 for this model. See the max_position_embeddings parameter in the config:

https://huggingface.co/BAAI/bge-large-zh-v1.5/blob/main/config.json#L23

@mokeyish
Copy link
Author

@ggerganov Okay, I'll try it later.

BTW: Is it possible to prevent the program from crashing because the input parameter text is too long?

@ggerganov
Copy link
Owner

It will not crash if you set the correct context size.

@mokeyish
Copy link
Author

It will not crash if you set the correct context size.

It doesn't seem to crash again. I'll continue to observe for a while.

@ggerganov If context size exceeds the allowed value, can it be automatically set to the max_position_embeddings and then output a warning to avoid a crash?

@ggerganov
Copy link
Owner

@ggerganov If context size exceeds the allowed value, can it be automatically set to the max_position_embeddings and then output a warning to avoid a crash?

No, in some cases it can make sense to set large --ctx-size.

The default value of --ctx-size is 0 which would use the default training context of the model if you don't specify it explicitly. So if you want to use the max_position_embeddings as a context size, simply do not pass the -c argument to the command.

@mokeyish
Copy link
Author

It will not crash if you set the correct context size.

@ggerganov Hello, After a day of testing, it still crashed with the same error as before.

@ggerganov
Copy link
Owner

Should be fixed in #10030

@mokeyish
Copy link
Author

Should be fixed in #10030

@ggerganov Thanks, I guess I won't have time to test until next week. I rolled back the version to early July and it didn't crash anymore. So it's not #9354 's problem.

@slaren
Copy link
Collaborator

slaren commented Oct 24, 2024

Note that ignoring the crash will most definitely lead the model to produce garbage. The problem is not the bounds check, the problem is the overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants