-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config does not support model_dim that does not match attention model_dim #244
Comments
Hi @stephantul, thanks for the report ! I agree with the error flagging, it's done down the line since the dimensions do not match but we could either catch that at config building time (a) or add an error message on top of this 'RuntimeError' (b) to make it a little easier to debug. I think that we should add a dimensionality check, option (a), feels like a relatively easy take. Note that if I remember correctly you can just pass dim_model once, and the other fields are auto filled with this value. On the other hand the architecture with an extra projection layer is not "standard" Transformer, I would label that as a feature, could be interesting indeed. Do you know if there are existing papers testing out this take, out of curiosity ? |
Hi @blefaudeux, Thanks for the quick response! I got the idea from this repo. In this electra implementation, the embeddings of the discriminator and the generator are tied, but the hidden dimension of the discriminator is 256, while the hidden size of the generator is 64. This size discrepancy is necessary for electra pretraining to work (i.e., the generator needs to be about 1/4 of the discriminator in size). The repo I linked works with the huggingface implementation, which solves it by using an embeddings projector, see here. I'm sure we can work around this somehow, but it would be a nice feature to perhaps add "projector layers" in between parts of the model that have a dimensionality mismatch when loading a config (maybe with an optional boolean flag to not let users accidentally add these projector layers?) |
Sounds like a plan, I like the idea actually, seems like a good fit with the existing codebase, can be extended to do just that J believe I can have a look later (time permitting) or feel free to submit a PR if you'd like |
@blefaudeux I submitted a PR that automatically inserts projectors. Initially, I wrote it so it would add projects in between everything (i.e., mha, ff, ln), but that got a bit messy and needs a nicer solution, so I rolled back to this. I also think there is far less demand for mismatching dims between mha and ff. |
🐛 Bug
Adding a
dim_model
within aposition_encoding_config
that is not equal to thedim_model
of the rest of the model results in an unusable model, as the result of the embedding layer is no longer compatible with the feedforward layers within the attention blocks.This looks like a bug to me, since it doesn't ever allow you to specify a
dim_model
in the vocabulary that is not equal to thedim_model
of the global model. In my use-case, in which I would like to tie the input of embeddings of two different models with different hidden sizes, it would be nice to have this work by inserting a linear layer in between the embedding layer and the first attention layer.Command
Using the following config results in a crash:
To Reproduce
See above
Expected behavior
I'd expect the model builder to complain that adding a
dim_model
within a vocab block does not make sense, or I expect it to work.The text was updated successfully, but these errors were encountered: