Segmentation fault in converting my llama2c models to ggml. #2574

saltyduckegg · 2023-08-10T03:49:36Z

hello !
I am try to convert my llama2c models to ggml.
but it looks like need a vocab file. so how can i get it ?

or How can i convert my tokenizer.model to a GGML file?
i only have tokenizer.model and tokenizer.bin now

$ ./bin/convert-llama2c-to-ggml --vocab-model ../../llama2.c.xs/tokenizer.model   --llama2c-model  ../../llama2.c.xs/out/model.bin   --llama2c-output-model ./xs
[malloc_weights:AK] Allocating [8000] x [288] = [2304000] float space for w->token_embedding_table
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_att_weight
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_ffn_weight
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wq
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wk
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wv
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wo
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w1
[malloc_weights:AK] Allocating [6] x [288] x [768] = [1327104] float space for w->w2
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w3
[malloc_weights:AK] Allocating [288] float space for w->rms_final_weight
llama.cpp: loading model from ../../llama2.c.xs/tokenizer.model
error loading model: unknown (magic, version) combination: 050a0e0a, 6b6e753c; is this really a GGML file?
llama_load_model_from_file: failed to load model
Segmentation fault (core dumped)

The text was updated successfully, but these errors were encountered:

saltyduckegg · 2023-08-10T04:03:17Z

using the ./models/ggml-vocab.bin vocab file will make the model dont speak human language

$ ./main -m ./xs -p "One day, Lily met a Shoggoth" -n 500 -c 256 -eps 1e-5
main: build = 0 (unknown)
main: seed  = 1691639972
llama.cpp: loading model from ./xs
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 8000
llama_model_load_internal: n_ctx      = 256
llama_model_load_internal: n_embd     = 288
llama_model_load_internal: n_mult     = 32
llama_model_load_internal: n_head     = 6
llama_model_load_internal: n_head_kv  = 6
llama_model_load_internal: n_layer    = 6
llama_model_load_internal: n_rot      = 48
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-05
llama_model_load_internal: n_ff       = 768
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 0 (all F32)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.02 MB
llama_model_load_internal: mem required  =   40.39 MB (+    1.69 MB per state)
llama_new_context_with_model: kv self size  =    1.69 MB
llama_new_context_with_model: compute buffer total size =    9.44 MB

system_info: n_threads = 28 / 56 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 256, n_batch = 512, n_predict = 500, n_keep = 0


 One day, Lily met a Shoggothmtmt – remformOIfighelerA must
                                                            C¤ډ00$earery2ց defined `laceush¸way both 
                                                                                                    ogetherim|am8°ap two det comp your¾lip/ lot deten2׈i K¤ight2׈ elZ=edeWree performanceblemZointparamcriptibilityҡnel care Queƥns 
                                                                                                                                                                                                                                    - el with chang knowcit auf2׬d premiür uport takportitemROportschDEschTities-com($'com andier responsefter2لranéediddleca wonscript¾8 bu= еsubca det]{achribW $\ave пре года Milave stationularH his-\-\42֠when pro townWfor eventomenular
                                                                                                                                                                                                                                    ```

saltyduckegg · 2023-08-10T06:11:46Z

maybe i find it:

python convert.py /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/out/ --vocab-only --vocab-dir /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/  --outfile ./good  
vocabtype: spm
Loading vocab file /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/tokenizer.model
Traceback (most recent call last):
  File "/mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-master/convert.py", line 1326, in <module>
    main()
  File "/mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-master/convert.py", line 1303, in main
    OutputFile.write_vocab_only(outfile, vocab)
  File "/mnt/sdb/lizz/project/003.lizz/16.llama/llama2cpp.other/llama.cpp-master/convert.py", line 1096, in write_vocab_only
    params = Params(n_vocab=vocab.vocab_size, n_embd=0, n_mult=0, n_head=1, n_layer=0)
TypeError: Params.__init__() missing 1 required positional argument: 'n_kv_head'

but still have strange fault.

if i give the n_kv_head=None or n_kv_head=0

$ python convert.2.py /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/out/ --vocab-only --vocab-dir /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/xiaoshuo.model  --outfile ./good  
vocabtype: spm
Loading vocab file /mnt/sdb/lizz/project/003.lizz/16.llama/llama2.c.xs/xiaoshuo.model
Wrote good
$ ./bin/convert-llama2c-to-ggml --vocab-model ./good   --llama2c-model  ../../llama2.c.xs/out/model.bin   --llama2c-output-model ./xss
[malloc_weights:AK] Allocating [8000] x [288] = [2304000] float space for w->token_embedding_table
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_att_weight
[malloc_weights:AK] Allocating [6] x [288] = [1728] float space for w->rms_ffn_weight
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wq
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wk
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wv
[malloc_weights:AK] Allocating [6] x [288] x [288] = [497664] float space for w->wo
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w1
[malloc_weights:AK] Allocating [6] x [288] x [768] = [1327104] float space for w->w2
[malloc_weights:AK] Allocating [6] x [768] x [288] = [1327104] float space for w->w3
[malloc_weights:AK] Allocating [288] float space for w->rms_final_weight
llama.cpp: loading model from ./good
Floating point exception (core dumped)

still bad for me

SlyEcho · 2023-08-10T10:08:48Z

You should bring this up in #2559 before it is merged.

klosax · 2023-08-10T10:37:40Z

Try setting --vocab-model to a working llama2 ggml model, not a tokenizer file. I think the vocab will be copied from the model file.

saltyduckegg · 2023-08-10T14:06:29Z

Thank you for your help!
My little llama2c model is trained by my own generated segmentation model :"tokenizer.model", so I don't have a ggml model that can provide the correct vocabulary. How can I build a ggml model with the correct vocabulary that matches my tokenizer.model?

saltyduckegg changed the title ~~[User] Insert summary of your issue or enhancement..~~ convert my llama2c models to ggml. Aug 10, 2023

saltyduckegg changed the title ~~convert my llama2c models to ggml.~~ Segmentation fault in converting my llama2c models to ggml. Aug 10, 2023

SlyEcho mentioned this issue Aug 10, 2023

Adding support for llama2.c models #2559

Merged

klosax linked a pull request Aug 10, 2023 that will close this issue

Adding support for llama2.c models #2559

Merged

saltyduckegg closed this as completed Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault in converting my llama2c models to ggml. #2574

Segmentation fault in converting my llama2c models to ggml. #2574

saltyduckegg commented Aug 10, 2023

saltyduckegg commented Aug 10, 2023

saltyduckegg commented Aug 10, 2023

SlyEcho commented Aug 10, 2023

klosax commented Aug 10, 2023

saltyduckegg commented Aug 10, 2023

Segmentation fault in converting my llama2c models to ggml. #2574

Segmentation fault in converting my llama2c models to ggml. #2574

Comments

saltyduckegg commented Aug 10, 2023

saltyduckegg commented Aug 10, 2023

saltyduckegg commented Aug 10, 2023

SlyEcho commented Aug 10, 2023

klosax commented Aug 10, 2023

saltyduckegg commented Aug 10, 2023