-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问模型启动一半卡住,是什么问题? #106
Comments
嗯,有的时候是会这样,回车一下有时会解决。 |
更新了llama.cpp之后确实可以跑起来了,但是生成速度非常慢,可能5-10Min生成1个字,这是正常的情况吗?比如下面是运行了20分钟之后的结果 system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | ' == Running in interactive mode. ==
Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
看一下这个#51 |
我尝试了一下,速度还是很慢,请问程序是不是没有调用我的显卡,导致速度慢呢 |
llama.cpp不会调用GPU。如果是mac M系列芯片会比较快。 |
我的系统是Linux debian 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 我使用-b指定了batch_size=2048,但输出的日志显示仍然是512,同时main的默认线程数是4,但是从输出日志来看,默认并不是4,而是将所有核都沾满(56/56)。 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration. |
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance. |
main: seed = 1681116321
llama_model_load: loading model from 'zh-models/7B/ggml-model-f16.bin' - please wait ...
llama_model_load: n_vocab = 49954
llama_model_load: n_ctx = 2048
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 1
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 13134.21 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 14926.29 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'zh-models/7B/ggml-model-f16.bin'
llama_model_load: model size = 13133.55 MB / num tensors = 291
llama_init_from_file: kv self size = 1024.00 MB
system_info: n_threads = 80 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:
'
sampling: temp = 0.200000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
generate: n_ctx = 2048, n_batch = 8, n_predict = 256, n_keep = 21
== Running in interactive mode. ==
Below is an instruction that describes a
使用的是./main -m zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3这个命令,输出就卡在a这个地方,也没办法进行交互
The text was updated successfully, but these errors were encountered: