Add support for layer replication in LoRA (huggingface#1368)

* Add support for layer replication in LoRA * Add test and update docs * Address review comments * Code cleanup and additional model support * Add docs, address comments * Add link to example model * Improve test and fix typos * Update src/peft/tuners/tuners_utils.py Fix typo in doc string. Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
BenjaminBossan · Mar 14, 2024 · 62e90f1 · 62e90f1
1 parent f8a45f0
commit 62e90f1
Showing 1 changed file with 23 additions and 0 deletions.
diff --git a/src/peft/tuners/lora/config.py b/src/peft/tuners/lora/config.py
@@ -105,6 +105,9 @@ class LoraConfig(PeftConfig):
             Enable 'Weight-Decomposed Low-Rank Adaptation' (DoRA). This technique decomposes the updates of the weights
             into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the magnitude is
             handled by a separate learnable parameter. This can improve the performance of LoRA, especially at low
+            ranks. Right now, DoRA only supports non-quantized linear layers. DoRA introduces a bigger overhead than
+            pure LoRA, so it is recommended to merge weights for inference. For more information, see
+            https://arxiv.org/abs/2402.09353.
         layer_replication(`List[Tuple[int, int]]`):
             Build a new stack of layers by stacking the original model layers according to the ranges specified. This
             allows expanding (or shrinking) the model without duplicating the base model weights. The new layers will
@@ -265,6 +268,26 @@ class LoraConfig(PeftConfig):
             )
         },
     )
+    # Enables replicating layers in a model to expand it to a larger model.
+    layer_replication: Optional[list[tuple[int, int]]] = field(
+        default=None,
+        metadata={
+            "help": (
+                "This enables using LoRA to effectively expand a transformer model to a larger size by repeating some layers. "
+                "The transformation handles models (currently Llama, Bert or Falcon compatible architectures) with "
+                "a module list in the model which it modifies to expand the number of modules. "
+                "Base weights are shared so the memory usage is close to the original model. The intended use is these base weights "
+                "remain fixed during finetuning but each layer has a separate LoRA adapter so the layers can be specialed via "
+                "the adapter layers fit during fine tuning."
+                "The format is a list of [start, end) pairs which specify the layer ranges to stack. For example:\n"
+                "   Original model has 5 layers labelled by their position in the model: `[0, 1, 2, 3, 4]`\n"
+                "   layer_replication: `[[0, 4], [2, 5]]`\n"
+                "   Final model will have this arrangement of original layers: `[0, 1, 2, 3, 2, 3, 4]`\n"
+                "This format is based on what is used for pass-through merges in mergekit. It makes it simple to select sequential "
+                "ranges of a model and stack them while reusing layers at either end of each sequence."
+            )
+        },
+    )
 
     def __post_init__(self):
         self.peft_type = PeftType.LORA