Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for layer replication in LoRA #1368

Merged
merged 8 commits into from
Mar 12, 2024

Conversation

siddartha-RE
Copy link
Contributor

This PR adds the ability to duplicate layers in a model according to a layer map
and then fine tune separate lora adapters for the layers post duplication. This
allows expanding a model to larger model and then fine tuning that model with
very little extra memory compared to the original smaller model.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @siddartha-RE for this feature! Can you elaborate more on its usage by providing some possible snippets of the API? Would you be happy to extend the current documentation and test suite to add that feature as well?

@siddartha-RE
Copy link
Contributor Author

siddartha-RE commented Jan 22, 2024 via email

@siddartha-RE
Copy link
Contributor Author

I have created an example model using this PR:
https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
and the model description explains the motivation for this PR.

I have also added a test to verify and added documentation in the code. I am not quite sure where to add additional docs. Should I modify one of the existing task docs that discuss LoRA (did not see an obvious candidate) or should I check a new example under examples/causal_language_modelling?

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @siddartha-RE, I like the idea, the implementation is clear and I went over the POC experiment you have shared. I think this is a good feature to have. I would like to know the thoughts of @BenjaminBossan and @younesbelkada on this.

@siddartha-RE
Copy link
Contributor Author

Following up here with an ablation study:

Model Trainable Params Train Loss Eval Loss GSM8K TruthfulQA
Mistral 7B 0 - - 0.374 0.426
Mistral 10B 0 - - 0.290 0.407
Mistral 7B + LoRA r=12 31M 0.412 0.366 0.514 0.499
Mistral 10B + LoRA r=8 31M 0.401 0.363 0.663 0.540

The 10B model was built by layer stacking the 7B model with the following config [[0, 16], [8, 24], [24, 32]] for a total of 48 layers initialized from the 32 layers in the original model. We ran LoRA finetuning on the original 32 layer model and 48 layer model (using this PR) and adjusting the r parameters so that trainable parameters were exactly the same in both cases. You can see that the 48 layers model clearly achieved better performance.

I think this is a pretty compelling approach for training large models without incurring the full memory cost.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing this interesting new technique.

From my understanding of reading the code, this effectively modifies the base model by copying a couple of existing layers (with shared weights). Then, by applying different LoRA weights on these copies, it is ensured that these copied layers can learn different things.

From my understanding, the copy-functionality is not tied to LoRA at all. The same principle could be applied to IA³, LoHa, etc. Therefore, I think it would make sense to remove this functionality from LoraModel and make it a generic utility function. In practice, the usage would look something like this:

base_model = AutoModelForCausalLM.from_pretrained(...)
layer_replication_map = [(0, 16), (8, 24), (16, 32)]
extended_model = replicate_layers(base_model, layer_replication_map)
config = LoraConfig(...)  # or IA3Config or ...
peft_model = get_peft_model(extended_model, config)

The advantage of this approach would be to make it generic and not overburden the existing LoRA implementation with something that is not fundamentally related to LoRA.

The disadvantage would be:

  • as a user, I have to run the same extension code again when loading the trained adapter
  • as it's not part of the LoraConfig, we cannot easily check if merging is allowed

IMO, this is still a better approach, but please tell me what you think about my proposal.

@siddartha-RE
Copy link
Contributor Author

IMO, this is still a better approach, but please tell me what you think about my proposal.

I did consider how to keep it independent of LoRA. The thought I had was to make it just a separate adapter that can then be used with the multi adapter support that has recently landed to allow mixing with other adapters. The reasons I did not do it is because multi adapter seems to still be evolving and I didn't know how many adapters this would actually realistically work with. Second, I thought it was better to see if this method was generally interesting to the community before trying a more complex implementation.

One definite goal is to allow this to be loaded without any custom code so I would like the solution to work with any code that current just works with PeftConfig.from_pretrained(...)

@BenjaminBossan
Copy link
Member

I did consider how to keep it independent of LoRA. The thought I had was to make it just a separate adapter that can then be used with the multi adapter support that has recently landed to allow mixing with other adapters.

I wonder if we even need a full new adapter class for this feature. What if we just make it a function that can be called on the base model before applying LoRA or another method?

@siddartha-RE
Copy link
Contributor Author

siddartha-RE commented Feb 9, 2024 via email

@BenjaminBossan
Copy link
Member

But say I want to submit a model to HF. How would I push a config that would allow people to load the model without custom code to invoke the hook.

I definitely see your point of wanting to avoid requiring custom code. Loading from HF Hub would still work for the full model state dict, right? But of course, that would be a huge waste of space, so it'd be nice to have another solution. Still, I don't think the solution is to bundle this with LoRA, as the two are independent.

I could see the argument for having a separate adapter type with its own config, which could be combined with LoRA in a separate step, but I'm not sure if that's really better than having custom code, which comes down to a single function call 🤔

At the end of the day, if the other maintainers would rather have this bundled with LoRA, I can certainly be convinced :) My personal preference is still a separate function call though.

@siddartha-RE
Copy link
Contributor Author

siddartha-RE commented Feb 9, 2024 via email

@BenjaminBossan
Copy link
Member

I can certainly factor this into a separately usable function, but perhaps also leave it as a functionality in LoRA which I see as one of the primary ways to leverage this efficiently.

This may be a good compromise. As a separate function, it should also make it easier to add it to the other methods. I'm a little wary that we're overburdening the LoRA options, which are already very many, with the features not being very modular or easy to compose, but that's an issue for another day.

Keep in mind its not just the issue of downloading the weights. This approach even keeps GPU memory usage almost the same as the base model. Effectively we have trained 120B models using the memory of a 70B model. Because the lora adapters are distinct for all the layers in the effectively 120B model it is really like have double the number of independent layers.

Yes, this is great, but it's not directly related to integrating this into LoraModel, right?

@siddartha-RE
Copy link
Contributor Author

siddartha-RE commented Feb 9, 2024 via email

@BenjaminBossan
Copy link
Member

@siddartha-RE Let us know once this is ready for review.

@siddartha-RE
Copy link
Contributor Author

@siddartha-RE Let us know once this is ready for review.

I think this is ready. I did the refactoring of the function so it is in utils. LoRA just hooks into it.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I have some comments, please take a look.

src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/model.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/model.py Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
tests/test_decoder_models.py Outdated Show resolved Hide resolved
tests/test_decoder_models.py Outdated Show resolved Hide resolved
tests/test_decoder_models.py Show resolved Hide resolved
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting close to finishing this PR. I have a few comments left, please check them out.

Also, please always run make style once you're finished. We now also have a pre-commit config in case you want to add a pre-commit hook.

src/peft/tuners/tuners_utils.py Show resolved Hide resolved
tests/test_decoder_models.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
@siddartha-RE
Copy link
Contributor Author

Sorry about not running the code style checks. Ran the test but forgot to rerun make quality/style. Ran it as part of the latest commit.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I only have a few smaller comments left.

Also, I think it would be nice to add a section to the docs, perhaps here. This way, you'll have a better chance that users will discover this feature. Ideally, a nice example could be documented there, or a link to a working example added.

src/peft/tuners/lora/model.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the recent changes. The documentation is much clearer now and users should be able to understand what this does and find further information.

I found a couple of typos etc., otherwise this LGTM.

One question: Would we be able to check in the unit test that the weights are shared instead of creating a new copy?

docs/source/developer_guides/lora.md Outdated Show resolved Hide resolved
docs/source/developer_guides/lora.md Outdated Show resolved Hide resolved
docs/source/developer_guides/lora.md Outdated Show resolved Hide resolved
docs/source/developer_guides/lora.md Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/config.py Outdated Show resolved Hide resolved
src/peft/tuners/lora/model.py Outdated Show resolved Hide resolved
@siddartha-RE siddartha-RE force-pushed the user/siddartha/duplicate_layers branch from 1da5e1d to 35d8bc7 Compare March 5, 2024 20:29
@siddartha-RE
Copy link
Contributor Author

@BenjaminBossan addressed comments and updated the unit test to check that some weights are shared and others are distinct.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this LGTM.

I found a minor error in a docstring, no big deal.

Let's have a final review by @younesbelkada or @pacman100 before merging.

src/peft/tuners/tuners_utils.py Outdated Show resolved Hide resolved
Fix typo in doc string.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@siddartha-RE
Copy link
Contributor Author

@BenjaminBossan do you have a sense for when this PR is likely to be merged?

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Thank you @siddartha-RE for all the work on adding support for memory efficient way of replicating layers and inserting LoRAs on those for efficient fine-tuning! 🔥🚀✨

@pacman100 pacman100 merged commit 5471c9a into huggingface:main Mar 12, 2024
14 checks passed
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Mar 14, 2024
* Add support for layer replication in LoRA

* Add test and update docs

* Address review comments

* Code cleanup and additional model support

* Add docs, address comments

* Add link to example model

* Improve test and fix typos

* Update src/peft/tuners/tuners_utils.py

Fix typo in doc string.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Mar 14, 2024
* Add support for layer replication in LoRA

* Add test and update docs

* Address review comments

* Code cleanup and additional model support

* Add docs, address comments

* Add link to example model

* Improve test and fix typos

* Update src/peft/tuners/tuners_utils.py

Fix typo in doc string.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@hohoCode
Copy link

hohoCode commented Apr 4, 2024

Is this possible to support flan-t5? Because t5 still has a huge commercial base and been used widely. @siddartha-RE @BenjaminBossan

Might look like so in the replicate_layers() function (plus other changes maybe):
layers = model.encoder.block #instead of .layers
Thanks!

@BenjaminBossan
Copy link
Member

I think there is no technical reason why this shouldn't work, but with the given way of specifying the layers, I'm not sure how to indicate if layers in the encoder or decoder part should be replicated. Maybe Siddartha has an idea.

@siddartha-RE
Copy link
Contributor Author

I think there is no technical reason why this shouldn't work, but with the given way of specifying the layers, I'm not sure how to indicate if layers in the encoder or decoder part should be replicated. Maybe Siddartha has an idea.

I think the easiest option here may be to promote the replication config to Union[Dict[str, list], list] if it is a dict then there will be separate entries for encoder and decoder and architectures that support can do the right thing.

@BenjaminBossan
Copy link
Member

I think the easiest option here may be to promote the replication config to Union[Dict[str, list], list] if it is a dict then there will be separate entries for encoder and decoder and architectures that support can do the right thing.

Sounds good, are you interested in adding this? I checked mergekit and AFAICT there is no specification for how to define this for encoder-decoder models. It would be unfortunate if they come up with a different standard, but I guess we could add that later.

@hohoCode
Copy link

hohoCode commented Apr 9, 2024

Another quick thing I tried on Flan-T5 is probably also a quick bug: the total number of final duplicated blocks cannot be more than the original total number of blocks.

So for Flan-T5 with 24 original blocks, if we do "layer_replication=[[0, 24], [22, 24]]". This will lead to an 'out of index error', since the new total number of blocks is 26, more than 24.

However, if we do "[[0, 7], [9, 24], [22, 24]]" that will not have the error since there are just 24 blocks. Just to report this for T5. Thanks!

@xzuyn
Copy link

xzuyn commented Apr 24, 2024

Could this possibly be updated to optionally set the replicated layer's o_proj and down_proj to zero so that the performance remains the same as the base before training?

I think this idea initially comes from LLaMa-Pro, and was later tested on chargoddard's mistral-11b-slimorca.

Quoting Charles here:

The base model for this came from a variation on Undi's Mistral 11B recipe. The o_proj and down_proj tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.

Benchmarks look good locally but still evaluating actual usefulness. Update: this turned out great! 10/10 would recommend as a training approach.

@BenjaminBossan
Copy link
Member

Could this possibly be updated to optionally set the replicated layer's o_proj and down_proj to zero so that the performance remains the same as the base before training?

Also described in this answer.ai blogpost.

@KimJaehee0725
Copy link

@siddartha-RE Hello. First of all, thanks for your brilliant idea. I want to deep dive into the mechanism. For that, could you give me the specific configurations of experiments of the models you trained, https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B.

It will be really helpful for my work.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants