Bug: llama-server crash with `--embeddings` #9978

mokeyish · 2024-10-21T09:30:09Z

What happened?

After starting with the following command, it will occasionally crash suddenly while running.

llama-server -m ./bge-large-zh-v1.5 --port 3358 -a emb@bge-large-zh-v1.5 -ngl 100 -c 8192 --samplers tempera
ture;top_p --embeddings -ub 8192 --pooling cls

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 4 CUDA devices:
Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 1: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 2: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
Device 3: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes
version: 3945 (45f0976)
built with cc (Debian 10.2.1-6) 10.2.1 20210110 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

[Thread debugging using libthread_db enabled]                                                                                         
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".                                                            
0x00007f210ada7787 in __GI___wait4 (pid=3074567, stat_loc=0x7fffc0b6d3a4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:
27                                                                                                                                    
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.                                                                
#0  0x00007f210ada7787 in __GI___wait4 (pid=3074567, stat_loc=0x7fffc0b6d3a4, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait
4.c:27                                                                                                                                
27      in ../sysdeps/unix/sysv/linux/wait4.c                                                                                         
#1  0x00007f210b21b638 in ggml_abort () from /home/user/llama.cpp/build/ggml/src/libggml.so                                       
#2  0x00007f210b21f700 in ggml_compute_forward_get_rows () from //home/user/llama.cpp/build/ggml/src/libggml.so                    
#3  0x00007f210b24d0a2 in ggml_graph_compute_thread.isra () from /home/user/llama.cpp/build/ggml/src/libggml.so                   
#4  0x00007f210b250cf6 in ggml_graph_compute () from /home/user/llama.cpp/build/ggml/src/libggml.so                               
#5  0x00007f210b25caf3 in ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) () from /home/user/llama.cpp/build/ggml/src/
libggml.so                                                                                                                            
#6  0x00007f210b261d75 in ggml_backend_sched_graph_compute_async () from /home/user/llama.cpp/build/ggml/src/libggml.so           
#7  0x00007f2120eb15c2 in llama_decode () from /home/user/llama.cpp/build/src/libllama.so                                         
#8  0x000055776938aa04 in server_context::update_slots() ()                                                                           
#9  0x000055776936d7e1 in server_queue::start_loop() ()                                                                               
#10 0x0000557769324981 in main ()                                                                                                     
[Inferior 1 (process 2694414) detached]

The text was updated successfully, but these errors were encountered:

slaren · 2024-10-21T14:15:04Z

This is likely due to an overflow in the positions embeddings when exceeding the maximum sequence length supported by the model. Limiting the context size to the max sequence length supported by the model (which in this case seems to be 1024) should avoid the crash.

mokeyish · 2024-10-22T01:24:08Z

@slaren Thanks! Earlier versions did not have this problem. It appeared recently.

slaren · 2024-10-22T01:27:09Z

The check was added in #9354. Earlier versions didn't crash, but the results were not correct either.

mokeyish · 2024-10-22T08:01:10Z

@slaren I set context size to 1024, it still crash. Can it be changed to just a warning? Stay the same as before? The program crashes and the service becomes unavailable.

ggerganov · 2024-10-22T08:48:26Z

Limiting the context size to the max sequence length supported by the model (which in this case seems to be 1024) should avoid the crash.

The context size might actually be 512 for this model. See the max_position_embeddings parameter in the config:

https://huggingface.co/BAAI/bge-large-zh-v1.5/blob/main/config.json#L23

mokeyish · 2024-10-22T09:42:58Z

@ggerganov Okay, I'll try it later.

BTW: Is it possible to prevent the program from crashing because the input parameter text is too long?

ggerganov · 2024-10-22T10:00:24Z

It will not crash if you set the correct context size.

mokeyish · 2024-10-23T04:54:25Z

It will not crash if you set the correct context size.

It doesn't seem to crash again. I'll continue to observe for a while.

@ggerganov If context size exceeds the allowed value, can it be automatically set to the max_position_embeddings and then output a warning to avoid a crash?

ggerganov · 2024-10-23T08:02:11Z

@ggerganov If context size exceeds the allowed value, can it be automatically set to the max_position_embeddings and then output a warning to avoid a crash?

No, in some cases it can make sense to set large --ctx-size.

The default value of --ctx-size is 0 which would use the default training context of the model if you don't specify it explicitly. So if you want to use the max_position_embeddings as a context size, simply do not pass the -c argument to the command.

mokeyish · 2024-10-24T03:20:18Z

It will not crash if you set the correct context size.

@ggerganov Hello, After a day of testing, it still crashed with the same error as before.

ggerganov · 2024-10-24T08:13:49Z

Should be fixed in #10030

mokeyish · 2024-10-24T10:32:59Z

Should be fixed in #10030

@ggerganov Thanks, I guess I won't have time to test until next week. I rolled back the version to early July and it didn't crash anymore. So it's not #9354 's problem.

slaren · 2024-10-24T10:43:59Z

Note that ignoring the crash will most definitely lead the model to produce garbage. The problem is not the bounds check, the problem is the overflow.

mokeyish added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Oct 21, 2024

slaren added bug Something isn't working and removed bug-unconfirmed labels Oct 21, 2024

ggerganov mentioned this issue Oct 24, 2024

server : check that the prompt fits in the slot's context #10030

Merged

ggerganov closed this as completed in #10030 Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: llama-server crash with `--embeddings` #9978

Bug: llama-server crash with `--embeddings` #9978

mokeyish commented Oct 21, 2024 •

edited

Loading

slaren commented Oct 21, 2024

mokeyish commented Oct 22, 2024

slaren commented Oct 22, 2024

mokeyish commented Oct 22, 2024

ggerganov commented Oct 22, 2024

mokeyish commented Oct 22, 2024

ggerganov commented Oct 22, 2024

mokeyish commented Oct 23, 2024

ggerganov commented Oct 23, 2024

mokeyish commented Oct 24, 2024

ggerganov commented Oct 24, 2024

mokeyish commented Oct 24, 2024

slaren commented Oct 24, 2024

Bug: llama-server crash with --embeddings #9978

Bug: llama-server crash with --embeddings #9978

Comments

mokeyish commented Oct 21, 2024 • edited Loading

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

slaren commented Oct 21, 2024

mokeyish commented Oct 22, 2024

slaren commented Oct 22, 2024

mokeyish commented Oct 22, 2024

ggerganov commented Oct 22, 2024

mokeyish commented Oct 22, 2024

ggerganov commented Oct 22, 2024

mokeyish commented Oct 23, 2024

ggerganov commented Oct 23, 2024

mokeyish commented Oct 24, 2024

ggerganov commented Oct 24, 2024

mokeyish commented Oct 24, 2024

slaren commented Oct 24, 2024

Bug: llama-server crash with `--embeddings` #9978

Bug: llama-server crash with `--embeddings` #9978

mokeyish commented Oct 21, 2024 •

edited

Loading