Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert pytorch-based models to work with llama.cpp? #707

Closed
IngwiePhoenix opened this issue Apr 2, 2023 · 11 comments
Closed

Convert pytorch-based models to work with llama.cpp? #707

IngwiePhoenix opened this issue Apr 2, 2023 · 11 comments
Labels

Comments

@IngwiePhoenix
Copy link

Out of curiosity, I want to see if I can launch a very mini AI on my little network server. It usually has around 3GB of free memory, and it'd be nice to chat with it sometimes. For that, I'd like to try a smaller model like Pythia.

So I would like to know:

  • Can I convert pytorch_model*.bin to ggjm?
  • Can I quantize those models to use even less memory as a sort of post-processing step?

I looked at the existing convert_*.py scripts, but none of those seemed to be for this type of model.

Thanks in advance!

@KASR
Copy link
Contributor

KASR commented Apr 2, 2023

I think you can use the same approach as what I posted in #708

@MillionthOdin16
Copy link

There's an old convert.py script that does this. The combination of this, with the more recent conversion scripts to the new GGML formats makes it work fine. I don't know why the older convert.py script was nixed.

@MillionthOdin16
Copy link

@IngwiePhoenix this should work for you. Ping me if you have trouble. After using this you'll need to migrate to the new ggml format. For some reason, the existing pth->ggml converter only accepts the base consolidated.00.pth format. This accepts .pt and .pth and hf pytorch formatted models. I'm not sure what normally generates the params.json file, but I included one as an example (for llama 13B) in the gist. I think the only thing that changes is the dim, heads, and layers.

Convert.py, Params.json

I can't find where this file originally came from, but ty to the person that made it :)

@BenjiKCF
Copy link

BenjiKCF commented Apr 3, 2023

@IngwiePhoenix this should work for you. Ping me if you have trouble. After using this you'll need to migrate to the new ggml format. For some reason, the existing pth->ggml converter only accepts the base consolidated.00.pth format. This accepts .pt and .pth and hf pytorch formatted models. I'm not sure what normally generates the params.json file, but I included one as an example (for llama 13B) in the gist. I think the only thing that changes is the dim, heads, and layers.

Convert.py, Params.json

I can't find where this file originally came from, but ty to the person that made it :)

Hi, thanks for your code. I have the following problem when I try to use the code, not sure if I am correct.

I first use this code to convert my pth to bin files. link

and then I use python3 convert.py --vocab-dir models/tokenizer.model --outfile models/lora-alpaca --model models/hf_ckpt

(torch2-py39) root@957acb7603b5:/RP1/mydocker/Ben/llama.cpp# python3 convert.py --vocab-dir models/tokenizer.model --outfile models/lora-alpaca --model models/hf_ckpt
Traceback (most recent call last):
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 673, in <module>
    main()
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 664, in main
    model, model_path = load_some_model(args.model)
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 620, in load_some_model
    model = lazy_load_torch(path)
  File "/RP1/mydocker/Ben/llama.cpp/convert.py", line 496, in lazy_load_torch
    model = unpickler.load()
TypeError: 'staticmethod' object is not callable

@IngwiePhoenix
Copy link
Author

@IngwiePhoenix this should work for you. Ping me if you have trouble. After using this you'll need to migrate to the new ggml format. For some reason, the existing pth->ggml converter only accepts the base consolidated.00.pth format. This accepts .pt and .pth and hf pytorch formatted models. I'm not sure what normally generates the params.json file, but I included one as an example (for llama 13B) in the gist. I think the only thing that changes is the dim, heads, and layers.

Convert.py, Params.json

I can't find where this file originally came from, but ty to the person that made it :)

Getting fiver internet installed and setting up still. Will run this code later, but looks promising!

The parameters you were talking about are - as far as my novice understanding goes - part of the model itself. Like... part of the struct that makes up the header for them. I don't know much more than that myself, but if I figure something out about that, I'll let you know ASAP!

Thanks for the links, too!

@KASR
Copy link
Contributor

KASR commented Apr 4, 2023

@MillionthOdin16 fyi I think the convert.py comes from here #545 (I think the plan is still to include into the main)

@MillionthOdin16
Copy link

Okay great! Yeah I still use that convert script a ton. Then I just convert it to the new ggml format.

@clulece
Copy link

clulece commented Apr 5, 2023

I would like to request that all the various conversion scripts, formats, and files needed to convert models be documented in a single location so that we can refer people to that location. Ideally the document would also include a flow chart for people to follow so they can get from whatever variant they're starting with to a format that can be loaded with llama-cpp.

If this document does not exist, I would be happy to help write it. I don't know the technical specifics of llama-cpp, pytorch or even ML, but I am a programmer by profession who has worked with a lot of low level binary formats and protocols in the past and have read a lot of rfcs and proprietary specifications in order to do so. I just need guidance to learn what is necessary to understand how all the various formats and scripts fit together.

Is writing such a document doable? Is this be something the developers of this project find useful and would be interested in supporting? Would it make more sense for me to open this as a new issue?

At the very least, such a document should severely cut down the number of issues posted about this topic.

@ghost
Copy link

ghost commented Apr 6, 2023

I would like to request that all the various conversion scripts, formats, and files needed to convert models be documented in a single location

Even better, combine all the scripts into one, PR in final stages: #545

@hackdefendr
Copy link

Well I have tried ever suggestion in this issue and nothing is working for the Salesforce-CodeGen-16B-multi model on Huggingface. It is downloaded as a 32GB pytorch_model.bin file and none of the convert scripts work.

COMMAND
python3 convert.py ./Salesforce_codegen-16B-multi/pytorch_model.bin

ERROR

Traceback (most recent call last):
  File "/home/jpop/devel/models/convert.py", line 673, in <module>
    main()
  File "/home/jpop/devel/models/convert.py", line 664, in main
    model, model_path = load_some_model(args.model)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/devel/models/convert.py", line 620, in load_some_model
    model = lazy_load_torch(path)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpop/devel/models/convert.py", line 496, in lazy_load_torch
    model = unpickler.load()
            ^^^^^^^^^^^^^^^^
  File "/home/jpop/devel/models/convert.py", line 486, in find_class
    return self.CLASSES[(module, name)]
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: ('torch', 'ByteStorage')```

Anyone have ideas on how to convert this one to llama.cpp?

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants