Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade v1/v2 format to v3 by leveraging quantize #1504

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

howard0su
Copy link
Collaborator

@howard0su howard0su commented May 17, 2023

Leverage quantize executable to support upgrade the models from v1 (previous) to v2 (latest).

Usage:
quantize <old_quantized_model> <new_mode_name> type

type must be match with the previous file type. The tool will not support re-quantize into another type.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

ggml.c Outdated Show resolved Hide resolved
ggml.c Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
@Green-Sky
Copy link
Collaborator

I would not add it into ggml.c . It's legacy, which we don't want to carry around.

@howard0su
Copy link
Collaborator Author

No mean to carry it forever. maybe remove after couple of weeks. the data format (struct block_q4_0) is only defined in ggml.c. I don't see there is other way to do so unless we copy the definition.

@howard0su howard0su marked this pull request as ready for review May 18, 2023 01:55
@rankaiyx
Copy link
Contributor

Maybe it can be made into a small independent software, so that it will not become a burden. Then modify the tips on the README.md by the way.

@howard0su
Copy link
Collaborator Author

The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term.

@rankaiyx
Copy link
Contributor

The intention is having a more seamless experience when upgrading model version. It is not goal to have a seperate tool or maintain this longer term.

Thank you very much for making a lot of my old models useful again.

Unfortunately,Now there is a new merge that seems to break backward compatibility again.

In order to deal with the same thing happening again, it should be reasonable to provide a special tool.
Logically, upgrading the format is not a quantitative behavior.

@howard0su
Copy link
Collaborator Author

yes, it is fine to just keep this PR as a PR and don't merge. I will make some code change after F16 change merged.

@daniandtheweb
Copy link
Contributor

Isn't it possible to integrate this as a separate tool? That way the legacy code could be kept away from the main program and the conversion would still be possible.

@howard0su
Copy link
Collaborator Author

You may notice the changes are in llama.cpp and ggml.c. If we want a new application, we pretty much copy the code.

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 20, 2023

The quantization code is copied several times already, actually. One in ggml.c, then ggml-cuda.cu and also ggml-opencl.c as well.

@howard0su howard0su changed the title Upgrade v1 format to v2 by leveraging quantize Upgrade v1/v2 format to v3 by leveraging quantize May 21, 2023
@howard0su
Copy link
Collaborator Author

Tested with v1 & v2 file of Q4_0 only. I don't have other format file. Please report the bug here.

@ggerganov this is ugly patch but it works. It is so painful if we don't provide convert tool for the old models. But I don't have much time to build another tool (and I don't think it is worth the effort as an intermediate tool.)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

ggml.c Outdated Show resolved Hide resolved
ggml.c Outdated Show resolved Hide resolved
ggml.c Outdated Show resolved Hide resolved
ggml.c Outdated Show resolved Hide resolved
ggml.c Outdated Show resolved Hide resolved
llama.cpp Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
@rankaiyx
Copy link
Contributor

There may be a compromise, that is, to create a fixed branch that contains the format conversion feature, which does not need to keep track of the latest code.
Then provide documentation on how to compile and use it in a reasonable place for those who need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants