-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetune produces unusable LoRA for Mistral model #3852
Comments
I played with finetuning parameters for Zephyr model and got the following result.
Maybe it is just not enough training for random LoRA to make any sense? I tried rank 24 with 133 iterations (loss 11 -> 5.8), and even scale 0.5 and 0.3 now produces garbage, even with prompt from shakespeare.txt (no way to know whether it was in training sample for LoRA). So, It doesn't look like more training makes things better. |
I experienced the same, but tried it with rank 4 and more iterations, so it took some time, until i hit it. Edit: can confirm, that higher rank breaks it after one iteration. AMD CPU here.
|
I hoped that #3974 will fix this issue, but rank 32 on b1497 still gives garbage output after one iteration. |
Are you sure this is Mistral/Zephyr specific? Is this "single finetune iteration" test something you've done on LLama models without issue? I have a general fix, #4082, and I'm wondering if it fixes your issue, especially since you're looking at a single iteration. My testing failed to show much improvement but I was doing 30 iterations. I did notice that 'loss' was lower for the first few finetune iterations, after my fix. |
Update: I've experimented with a single finetune iteration, and: Details:
|
@AndrewGodfrey |
It's strange but I can reproduce the issue only on Mistral, both with master and #4082 branches. So maybe Mistral architecture is a problem (it is a bit different from llama, iirc). I tried tinyllama before submitting the issue and openllama-3b-v2 (on current master and on PR branch), for both of them single-iteration-trained lora resulted into almost fine output. @AndrewGodfrey actually in your example output without the fix is not that bad too. And finetune program refused to work on f16 for me, so for my test I quantized f16 to q8_0. For mistral/zephyr on current master and on PR branch result for single-step lora is completely incoherent. This is an example of lora trained using finetune from #4082 branch on Zephyr model:
|
Yes, I suppose the effect could be worse on some models than others. You are also using an unusually high value for lora-r, of 64 (at least, when compared to the default of 4, and the values explored in the LoRA paper). |
Oops I misread your earlier report. So my fix didn’t help with Mistral. Something I just realized may be true (but I haven’t tried it yet) is that the “train” example can be used for fine-tuning. The example named “finetune” is specifically for LoRA finetuning. So I wonder if this repros with “train” or is specific to the LoRA case. Again, this is my understanding from reading train.cpp recently but I haven’t tried it myself yet. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Expected Behavior
I expected finetune to produce a usable LoRA adapter for all supported models.
Current Behavior
For Mistral models (I tried both Mistral and Zephyr, Q8_0, Q5_K_M, Q5_0) model outputs gibberish with LoRA after a single finetune iteration.
On the same PC finetuning produces usable LoRA adapter for TinyLlama (I tried Q8_0, Q5_K_M, Q5_0).
First few tokens for "Building a website can be done in 10 simple steps:" prompt:
Base Mistral model:
Mistral with LoRA (single finetune iteration on shakespeare.txt from example):
Environment and Context
Core i7 4770 CPU
$ lscpu
$ uname -a
Failure Information (for bugs)
For Mistral models (I tried both Mistral and Zephyr, Q8_0, Q5_K_M, Q5_0) model outputs gibberish with LoRA after a single finetune iteration.
Steps to Reproduce
I used pre-converted models from TheBloke:
This issue can be reproduced using shakespeare.txt from finetune example, but I got same results for a different dataset.
Finetuning command:
For Zephyr (also produces invalid LoRA) and TinyLlama (produces valid LoRA) I changed only model-base parameter. Between experiments I removed all finetune checkpoints and LoRAs.
Testing without LoRA:
Testing with LoRA:
P.S. As a final part of this bug report I would like to thank all contributors for this amazing piece of software. It is a pleasure to use, and it gives an ability to experiment with LLMs even for those of us without top GPUs.
The text was updated successfully, but these errors were encountered: