-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples : Fix llama-export-lora
example
#8607
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't done tests but I think it is OK
Would recommend to add log messages with detailed information about:
- tensors being merged
- tensor sizes
- lora scaling
- ranks
- etc.
We can improve as needed
bool t_a = true; | ||
bool t_b = true; | ||
for (auto & adapter : adapters) { | ||
t_a &= nullptr != adapter->get_tensor(it.first + ".lora_a"); | ||
t_b &= nullptr != adapter->get_tensor(it.first + ".lora_b"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we require that all adapters provide data for the base tensor. If either adapter does not have data for that tensor, then we keep the original data.
Maybe in the future we can improve to support when a subset of the adapters have the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah right, thanks for spotting that. It's quite complicated to fix that for now, so I'd prefer to merge this PR as-is.
To prevent producing half-working model, I added a check and reject if that's the case. As a fallback solution, user can always run multiple times llama-export-lora
: 0ec2b58
llama-export-lora
examplellama-export-lora
example
examples : Fix `llama-export-lora` example (ggerganov#8607)
@@ -128,7 +128,6 @@ struct gpt_params { | |||
|
|||
// TODO: avoid tuple, use struct | |||
std::vector<std::tuple<std::string, float>> lora_adapter; // lora adapter path with user defined scale | |||
std::string lora_base = ""; // base model path for the lora adapter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for downstream consumers (like me) this PR removed lora_base
so I want to highlight this - also a QQ is this now not needed anymore since #8332 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since #8332 lora_base
is no longer needed because we're now merging lora at runtime (instead of merging it with base model at start up). You can still your base model as -m
in llama-export-lora
, but IMO it won't change end result very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks for the clarification!
* fix export-lora example * add more logging * reject merging subset * better check * typo
Resolve #8581
This can now accepts new lora format introduced in #8332
The output merged tensor will be forced to f16, since
ggml_cast
does not support Q-type quants. It would be nice to move threaded quantization functions from llama.cpp to ggml