GPU Memory-Usage, full finetuning vs LoRA vs LoRA + CPU Offloading #1804
Replies: 1 comment 1 reply
-
Hi !
Since backward pass is based on the chain rule, I would say so, however we only calculate gradients on the LoRA parameters |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It's a great work!
hi, how to calculate the GPU Memory in different methods?
In addition, using LoRA method, the base parameter will go through forward pass, will the base parameter go through backward pass?
Beta Was this translation helpful? Give feedback.
All reactions