huggingface · dumpmemory · Apr 24, 2023 · Apr 24, 2023 · Apr 24, 2023 · Apr 28, 2023
diff --git a/README.md b/README.md
@@ -358,8 +358,6 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume
 `P_TUNING`/`PROMPT_TUNING` appends soft prompt embeddings to `input_embeds` to create
 new `input_embeds` to be given to the model. Therefore, `generate` doesn't support this yet.
 
-4. When using ZeRO3 with zero3_init_flag=True, if you find the gpu memory increase with training steps. we might need to set zero3_init_flag=false in accelerate config.yaml. The related issue is [[BUG] memory leak under zero.Init](https://github.com/microsoft/DeepSpeed/issues/2637)
-
 ## Backlog:
 - [x] Add tests
 - [x] Multi Adapter training and inference support

diff --git a/src/peft/tuners/adalora.py b/src/peft/tuners/adalora.py
@@ -431,7 +431,11 @@ def forward(self, x: torch.Tensor):
                 self.unmerge()
             result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
         elif self.r[self.active_adapter] > 0 and not self.merged:
-            result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
+            result = torch.matmul(x, transpose(self.weight, not self.fan_in_fan_out))
+
+            if self.bias:
+                result += self.bias
+
             result += (
                 (
                     self.lora_dropout[self.active_adapter](x)

diff --git a/src/peft/tuners/lora.py b/src/peft/tuners/lora.py
@@ -490,7 +490,10 @@ def forward(self, x: torch.Tensor):
                 self.unmerge()
             result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
         elif self.r[self.active_adapter] > 0 and not self.merged:
-            result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
+            result = torch.matmul(x, transpose(self.weight, not self.fan_in_fan_out)) 
+
+            if self.bias:
+                result += self.bias
 
             x = x.to(self.lora_A[self.active_adapter].weight.dtype)