Convert pytorch-based models to work with llama.cpp? #707

IngwiePhoenix · 2023-04-02T10:53:58Z

Out of curiosity, I want to see if I can launch a very mini AI on my little network server. It usually has around 3GB of free memory, and it'd be nice to chat with it sometimes. For that, I'd like to try a smaller model like Pythia.

So I would like to know:

Can I convert pytorch_model*.bin to ggjm?
Can I quantize those models to use even less memory as a sort of post-processing step?

I looked at the existing convert_*.py scripts, but none of those seemed to be for this type of model.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

KASR · 2023-04-02T12:53:35Z

I think you can use the same approach as what I posted in #708

MillionthOdin16 · 2023-04-02T13:57:53Z

There's an old convert.py script that does this. The combination of this, with the more recent conversion scripts to the new GGML formats makes it work fine. I don't know why the older convert.py script was nixed.

MillionthOdin16 · 2023-04-02T14:57:22Z

@IngwiePhoenix this should work for you. Ping me if you have trouble. After using this you'll need to migrate to the new ggml format. For some reason, the existing pth->ggml converter only accepts the base consolidated.00.pth format. This accepts .pt and .pth and hf pytorch formatted models. I'm not sure what normally generates the params.json file, but I included one as an example (for llama 13B) in the gist. I think the only thing that changes is the dim, heads, and layers.

Convert.py, Params.json

I can't find where this file originally came from, but ty to the person that made it :)

BenjiKCF · 2023-04-03T08:31:23Z

@IngwiePhoenix this should work for you. Ping me if you have trouble. After using this you'll need to migrate to the new ggml format. For some reason, the existing pth->ggml converter only accepts the base consolidated.00.pth format. This accepts .pt and .pth and hf pytorch formatted models. I'm not sure what normally generates the params.json file, but I included one as an example (for llama 13B) in the gist. I think the only thing that changes is the dim, heads, and layers.

Convert.py, Params.json

I can't find where this file originally came from, but ty to the person that made it :)

Hi, thanks for your code. I have the following problem when I try to use the code, not sure if I am correct.

I first use this code to convert my pth to bin files. link

and then I use python3 convert.py --vocab-dir models/tokenizer.model --outfile models/lora-alpaca --model models/hf_ckpt

(torch2-py39) root@957acb7603b5:/RP1/mydocker/Ben/llama.cpp# python3 convert.py --vocab-dir models/tokenizer.model --outfile models/lora-alpaca --model models/hf_ckpt
Traceback (most recent call last):
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 673, in <module>
    main()
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 664, in main
    model, model_path = load_some_model(args.model)
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 620, in load_some_model
    model = lazy_load_torch(path)
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 496, in lazy_load_torch
    model = unpickler.load()
TypeError: 'staticmethod' object is not callable

IngwiePhoenix · 2023-04-04T06:41:17Z

@IngwiePhoenix this should work for you. Ping me if you have trouble. After using this you'll need to migrate to the new ggml format. For some reason, the existing pth->ggml converter only accepts the base consolidated.00.pth format. This accepts .pt and .pth and hf pytorch formatted models. I'm not sure what normally generates the params.json file, but I included one as an example (for llama 13B) in the gist. I think the only thing that changes is the dim, heads, and layers.

Convert.py, Params.json

I can't find where this file originally came from, but ty to the person that made it :)

Getting fiver internet installed and setting up still. Will run this code later, but looks promising!

The parameters you were talking about are - as far as my novice understanding goes - part of the model itself. Like... part of the struct that makes up the header for them. I don't know much more than that myself, but if I figure something out about that, I'll let you know ASAP!

Thanks for the links, too!

KASR · 2023-04-04T08:06:11Z

@MillionthOdin16 fyi I think the convert.py comes from here #545 (I think the plan is still to include into the main)

MillionthOdin16 · 2023-04-04T17:44:45Z

Okay great! Yeah I still use that convert script a ton. Then I just convert it to the new ggml format.

clulece · 2023-04-05T07:03:04Z

I would like to request that all the various conversion scripts, formats, and files needed to convert models be documented in a single location so that we can refer people to that location. Ideally the document would also include a flow chart for people to follow so they can get from whatever variant they're starting with to a format that can be loaded with llama-cpp.

If this document does not exist, I would be happy to help write it. I don't know the technical specifics of llama-cpp, pytorch or even ML, but I am a programmer by profession who has worked with a lot of low level binary formats and protocols in the past and have read a lot of rfcs and proprietary specifications in order to do so. I just need guidance to learn what is necessary to understand how all the various formats and scripts fit together.

Is writing such a document doable? Is this be something the developers of this project find useful and would be interested in supporting? Would it make more sense for me to open this as a new issue?

At the very least, such a document should severely cut down the number of issues posted about this topic.

ghost · 2023-04-06T09:41:03Z

I would like to request that all the various conversion scripts, formats, and files needed to convert models be documented in a single location

Even better, combine all the scripts into one, PR in final stages: #545

hackdefendr · 2023-12-25T17:45:24Z

Well I have tried ever suggestion in this issue and nothing is working for the Salesforce-CodeGen-16B-multi model on Huggingface. It is downloaded as a 32GB pytorch_model.bin file and none of the convert scripts work.

COMMAND
python3 convert.py ./Salesforce_codegen-16B-multi/pytorch_model.bin

ERROR

Traceback (most recent call last):
  File "/home/jpop/devel/models/convert.py", line 673, in <module>
    main()
  File "/home/jpop/devel/models/convert.py", line 664, in main
    model, model_path = load_some_model(args.model)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/devel/models/convert.py", line 620, in load_some_model
    model = lazy_load_torch(path)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/devel/models/convert.py", line 496, in lazy_load_torch
    model = unpickler.load()
            ^^^^^^^^^^^^^^^^
  File "/home/jpop/devel/models/convert.py", line 486, in find_class
    return self.CLASSES[(module, name)]
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: ('torch', 'ByteStorage')```

Anyone have ideas on how to convert this one to llama.cpp?

github-actions · 2024-04-11T01:07:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

MillionthOdin16 mentioned this issue Apr 2, 2023

[Question] Can I load a the huggingface llama model aswell? #708

Closed

jboero mentioned this issue Aug 20, 2023

What is inside llama-2-70b consolidated.00.pth file and how do I read it? meta-llama/llama#528

Open

staviq mentioned this issue Sep 21, 2023

[User]Failed to execute any models on s390x #3298

Closed

4 tasks

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert pytorch-based models to work with llama.cpp? #707

Convert pytorch-based models to work with llama.cpp? #707

IngwiePhoenix commented Apr 2, 2023

KASR commented Apr 2, 2023

MillionthOdin16 commented Apr 2, 2023

MillionthOdin16 commented Apr 2, 2023

BenjiKCF commented Apr 3, 2023

IngwiePhoenix commented Apr 4, 2023

KASR commented Apr 4, 2023

MillionthOdin16 commented Apr 4, 2023

clulece commented Apr 5, 2023 •

edited

Loading

ghost commented Apr 6, 2023

hackdefendr commented Dec 25, 2023

github-actions bot commented Apr 11, 2024

Convert pytorch-based models to work with llama.cpp? #707

Convert pytorch-based models to work with llama.cpp? #707

Comments

IngwiePhoenix commented Apr 2, 2023

KASR commented Apr 2, 2023

MillionthOdin16 commented Apr 2, 2023

MillionthOdin16 commented Apr 2, 2023

BenjiKCF commented Apr 3, 2023

IngwiePhoenix commented Apr 4, 2023

KASR commented Apr 4, 2023

MillionthOdin16 commented Apr 4, 2023

clulece commented Apr 5, 2023 • edited Loading

ghost commented Apr 6, 2023

hackdefendr commented Dec 25, 2023

github-actions bot commented Apr 11, 2024

clulece commented Apr 5, 2023 •

edited

Loading