-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapter support for GPTNeoX #521
Comments
Hey @ajesujoba, this sounds great, would be awesome to have GPTNeoX support integrated into the library, so feel free to do a PR! Regarding your question on language adapter training, could you add some more context on what you observed and which behavior you expected (ideally with some code snippet). Thank you! |
Thanks for your response @calpt. I have made a PR
The script ran successfully, but instead of training just the adapters, it was training both the adapter modules and the CLM head. So the total number of trainable paramters were
I was able to manually freeze these CLM head using |
Thanks for providing the additional context (and of course thanks for opening the PR!). After looking into it a bit deeper, the cause for this behavior seems to be that GPT-NeoX does not tie the weights of the input and output projection layers. By default, adapter-transformers will only freeze the weights of the base model, excluding the weights of any prediction head (as you usually want to fine-tune that with the adapter). Thus, for LM heads, freezing the output projection layer relies on the fact that most models supported so far share these weights with the input projection (which is part of the base model, therefore frozen). To ensure the expected behavior also for GPT-NeoX, we'd probably need to freeze the output projection manually somewhere in the code. Maybe adding it to the |
Hi @calpt, thanks for your feedback. I thought as much, I also noticed that they did not tie the weights of the input and output projection layers. Yes, I agree that freezing the prediction head somewhere else such as within |
You can directly integrate a fix for this into your PR with the model integration if you like. Otherwise, I could also add it independently. |
Checking again it appears it is not plausible to have it fixed within freeze_model(). |
I have implemented adapter for GPTNeoX following the instructions in the documentation. It passed all tests but during the training of the language adapter, it trained the prediction head too. Do you by chance have an idea why this is happening? Do a PR?
The text was updated successfully, but these errors were encountered: