Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Invalid model file when using converted GPT4ALL model after following provided instructions #655

Closed
gaceladri opened this issue Mar 31, 2023 · 11 comments

Comments

@gaceladri
Copy link

Hello,

I have followed the instructions provided for using the GPT-4ALL model. I used the convert-gpt4all-to-ggml.py script to convert the gpt4all-lora-quantized.bin model, as instructed. However, I encountered an error related to an invalid model file when running the example.

Here are the steps I followed, as described in the instructions:

  1. Convert the model using the convert-gpt4all-to-ggml.py script:
python3 convert-gpt4all-to-ggml.py models/gpt4all/gpt4all-lora-quantized.bin ./models/tokenizer.model
  1. Run the interactive mode example with the newly generated gpt4all-lora-quantized.bin model:
./main -m ./models/gpt4all/gpt4all-lora-quantized.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

However, I encountered the following error:

./models/gpt4all/gpt4all-lora-quantized.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])
you most likely need to regenerate your ggml files
the benefit is you'll get 10-100x faster load times
see https://github.com/ggerganov/llama.cpp/issues/91
use convert-pth-to-ggml.py to regenerate from original pth
use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model
main: error: failed to load model './models/gpt4all/gpt4all-lora-quantized.bin'

Please let me know how to resolve this issue and correctly convert and use the GPT-4ALL model with the interactive mode example.

Thank you.

@gaceladri gaceladri changed the title Error: Invalid model file when using converted GPT-4ALL model after following provided instructions Error: Invalid model file when using converted GPT4ALL model after following provided instructions Mar 31, 2023
@gaceladri
Copy link
Author

I could run it with the previous version https://github.com/ggerganov/llama.cpp/tree/master-ed3c680

@DonIsaac
Copy link

I could run it with the previous version https://github.com/ggerganov/llama.cpp/tree/master-ed3c680

After building from this tag, I'm getting a segfault. What OS are you using?

  • Using Macos 13.2 on an M1 chip
  • commit: ed3c680bcd0e8ce6e574573ba95880b694449878
  • output after running ./main -m g4a/gpt4all-lora-quantized.bin -p "hi there" -n 512:
main: seed = 1680284326
llama_model_load: loading model from 'g4a/gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 4273.35 MB
llama_model_load: mem required  = 6065.35 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from 'g4a/gpt4all-lora-quantized.bin'
llama_model_load: [1]    28303 segmentation fault  ./main -m g4a/gpt4all-lora-quantized.bin -p "hi there" -n 512

@rabidcopy
Copy link
Contributor

use migrate-ggml-2023-03-30-pr613.py

@gaceladri
Copy link
Author

I solved the issue by running the command:

python migrate-ggml-2023-03-30-pr613.py models/gpt4all/gpt4all-lora-quantized.bin models/gpt4all/gpt4all-lora-converted.bin

after executing the:

python3 convert-gpt4all-to-ggml.py models/gpt4all-lora-quantized.bin ./models/tokenizer.model

and now i'm interacting with gpt4all with:

./main -m ./models/gpt4all/gpt4all-lora-converted.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

@scottjmaddox
Copy link

Would it be worth updating the README section with this information?

@ROBOKiTTY
Copy link

ROBOKiTTY commented Apr 1, 2023

After running convert-gpt4all-to-ggml.py and migrate-ggml-2023-03-30-pr613.py, main segfaults with a failed ggml assertion.

1GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

Full logs:

H:\llama.cpp\bin>main -m models/gpt4all-lora-quantized-v2.bin -n 248
main: seed = 1680331950
llama_model_load: loading model from 'models/gpt4all-lora-quantized-v2.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/gpt4all-lora-quantized-v2.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 248, n_keep = 0


 5GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

@BoQsc
Copy link

BoQsc commented Apr 1, 2023

These are all steps that I did:

  1. Torrent download gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself

  2. python -m pip install torch numpy sentencepiece

  3. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

python convert-gpt4all-to-ggml.py ./models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 

python migrate-ggml-2023-03-30-pr613.py models/gpt4all/gpt4all-lora-quantized.bin models/gpt4all/gpt4all-lora-converted.bin.orig

main -m ./llama.cpp/models/gpt4all/gpt4all-lora-converted.bin.orig -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt


However it is writing nonsenses and does not let to interact with interactive mode. Maybe something is wrong.

@ROBOKiTTY
Copy link

After running convert-gpt4all-to-ggml.py and migrate-ggml-2023-03-30-pr613.py, main segfaults with a failed ggml assertion.

1GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

I commented out this line in ggml.c and recompiled to see what would happen, and it just worked. That was unexpected, but I won't complain.

@clxyder
Copy link

clxyder commented Apr 2, 2023

These are all steps that I did:

  1. Torrent download gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself
  2. python -m pip install torch numpy sentencepiece
  3. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model
python convert-gpt4all-to-ggml.py ./models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 

python migrate-ggml-2023-03-30-pr613.py models/gpt4all/gpt4all-lora-quantized.bin models/gpt4all/gpt4all-lora-converted.bin.orig

main -m ./llama.cpp/models/gpt4all/gpt4all-lora-converted.bin.orig -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

However it is writing nonsenses and does not let to interact with interactive mode. Maybe something is wrong.

Can anyone confirm if decapoda-research/llama-7b-hf's tokenizer.model is adequate to use in this case?

@ggerganov
Copy link
Owner

After running convert-gpt4all-to-ggml.py and migrate-ggml-2023-03-30-pr613.py, main segfaults with a failed ggml assertion.
1GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

I commented out this line in ggml.c and recompiled to see what would happen, and it just worked. That was unexpected, but I won't complain.

This is strange. It's expected that it works after commenting this line since we don't really need the buffer to be aligned, but I wonder why it is not the case anymore. Seems to be related to the mmap change.

@d0rc
Copy link

d0rc commented Jun 2, 2023

It happened to me when trying to use --prompt-cache on a custom model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants