-
-
Notifications
You must be signed in to change notification settings - Fork 958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsloth optims for Llama #1609
Unsloth optims for Llama #1609
Conversation
The cross_entropy_loss optimization is applicable even in a full fine tune, right? |
Correct! |
Are these optimizations compatible with flash attention? (Complete Noob here) |
It is possible to alter to also patch Qwen? It was added to unsloth and all optimizations work for it: |
Let's tackle that in a follow up PR |
Does this only work for single GPU? Getting error of "Runtime Error: Model must be 2-D" when enabling unsloth cross entropy loss on 2x3090Ti finetuning Llama 3 8B LORA. |
* WIP for unsloth integrations * import the unsloth code in the right context * add unsloth mlp, qkv, o lora optimizations * apply unsloth mlp and qkv kernels
WIP to integrate Unsloth's optimizations into axolotl.
The manual autograd for MLP, QKV, O only seems to help VRAM by 1% as opposed to the reported 8%.
The Cross Entropy Loss does help significantly, but only reduced VRAM by 13% as opposed to the reported 17%.
edit: clarification, the cross entropy loss works for both full fine tunes and lora. The MLP, QKV, and O are only for 4-bit qlora with flash attention.