Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in converting my llama2c models to ggml. #2574

Closed
saltyduckegg opened this issue Aug 10, 2023 · 5 comments · Fixed by #2559
Closed

Segmentation fault in converting my llama2c models to ggml. #2574

saltyduckegg opened this issue Aug 10, 2023 · 5 comments · Fixed by #2559

Comments

@saltyduckegg
Copy link

hello !
I am try to convert my llama2c models to ggml.
but it looks like need a vocab file. so how can i get it ?

or How can i convert my tokenizer.model to a GGML file?
i only have tokenizer.model and tokenizer.bin now

$ ./bin/convert-llama2c-to-ggml --vocab-model ../../llama2.c.xs/tokenizer.model   --llama2c-model  ../../llama2.c.xs/out/model.bin   --llama2c-output-model ./xs
[malloc_weights:AK] Allocating [8000] x [288] = [2304000] float space for w->token_embedding_table
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_att_weight
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_ffn_weight
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wq
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wk
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wv
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wo
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w1
[malloc_weights:AK] Allocating [6] x [288] x [768] = [1327104] float space for w->w2
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w3
[malloc_weights:AK] Allocating [288] float space for w->rms_final_weight
llama.cpp: loading model from ../../llama2.c.xs/tokenizer.model
error loading model: unknown (magic, version) combination: 050a0e0a, 6b6e753c; is this really a GGML file?
llama_load_model_from_file: failed to load model
Segmentation fault (core dumped)

@saltyduckegg saltyduckegg changed the title [User] Insert summary of your issue or enhancement.. convert my llama2c models to ggml. Aug 10, 2023
@saltyduckegg saltyduckegg changed the title convert my llama2c models to ggml. Segmentation fault in converting my llama2c models to ggml. Aug 10, 2023
@saltyduckegg
Copy link
Author

using the ./models/ggml-vocab.bin vocab file will make the model dont speak human language

$ ./main -m ./xs -p "One day, Lily met a Shoggoth" -n 500 -c 256 -eps 1e-5
main: build = 0 (unknown)
main: seed  = 1691639972
llama.cpp: loading model from ./xs
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 8000
llama_model_load_internal: n_ctx      = 256
llama_model_load_internal: n_embd     = 288
llama_model_load_internal: n_mult     = 32
llama_model_load_internal: n_head     = 6
llama_model_load_internal: n_head_kv  = 6
llama_model_load_internal: n_layer    = 6
llama_model_load_internal: n_rot      = 48
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-05
llama_model_load_internal: n_ff       = 768
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 0 (all F32)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.02 MB
llama_model_load_internal: mem required  =   40.39 MB (+    1.69 MB per state)
llama_new_context_with_model: kv self size  =    1.69 MB
llama_new_context_with_model: compute buffer total size =    9.44 MB

system_info: n_threads = 28 / 56 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 256, n_batch = 512, n_predict = 500, n_keep = 0


 One day, Lily met a Shoggothmtmt – remformOIfighelerA must
                                                            C¤ډ00$earery2ց defined `laceush¸way both 
                                                                                                    ogetherim|am8°ap two det comp your¾lip/ lot deten2׈i K¤ight2׈ elZ=edeWree performanceblemZointparamcriptibilityҡnel care Queƥns 
                                                                                                                                                                                                                                    - el with chang knowcit auf2׬d premiür uport takportitemROportschDEschTities-com($'com andier responsefter2لranéediddleca wonscript¾8 bu= еsubca det]{achribW $\ave пре года Milave stationularH his-\-\42֠when pro townWfor eventomenular
                                                                                                                                                                                                                                    ```

@saltyduckegg
Copy link
Author

maybe i find it:

python convert.py /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/out/ --vocab-only --vocab-dir /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/  --outfile ./good  
vocabtype: spm
Loading vocab file /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/tokenizer.model
Traceback (most recent call last):
  File "/mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-master/convert.py", line 1326, in <module>
    main()
  File "/mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-master/convert.py", line 1303, in main
    OutputFile.write_vocab_only(outfile, vocab)
  File "/mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-master/convert.py", line 1096, in write_vocab_only
    params = Params(n_vocab=vocab.vocab_size, n_embd=0, n_mult=0, n_head=1, n_layer=0)
TypeError: Params.__init__() missing 1 required positional argument: 'n_kv_head'

but still have strange fault.

if i give the n_kv_head=None or n_kv_head=0

$ python convert.2.py /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/out/ --vocab-only --vocab-dir /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/xiaoshuo.model  --outfile ./good  
vocabtype: spm
Loading vocab file /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/xiaoshuo.model
Wrote good
$ ./bin/convert-llama2c-to-ggml --vocab-model ./good   --llama2c-model  ../../llama2.c.xs/out/model.bin   --llama2c-output-model ./xss
[malloc_weights:AK] Allocating [8000] x [288] = [2304000] float space for w->token_embedding_table
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_att_weight
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_ffn_weight
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wq
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wk
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wv
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wo
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w1
[malloc_weights:AK] Allocating [6] x [288] x [768] = [1327104] float space for w->w2
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w3
[malloc_weights:AK] Allocating [288] float space for w->rms_final_weight
llama.cpp: loading model from ./good
Floating point exception (core dumped)

still bad for me

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 10, 2023

You should bring this up in #2559 before it is merged.

@klosax klosax linked a pull request Aug 10, 2023 that will close this issue
@klosax
Copy link
Contributor

klosax commented Aug 10, 2023

Try setting --vocab-model to a working llama2 ggml model, not a tokenizer file. I think the vocab will be copied from the model file.

@saltyduckegg
Copy link
Author

Thank you for your help!
My little llama2c model is trained by my own generated segmentation model :"tokenizer.model", so I don't have a ggml model that can provide the correct vocabulary. How can I build a ggml model with the correct vocabulary that matches my tokenizer.model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants