Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config does not support model_dim that does not match attention model_dim #244

Closed
stephantul opened this issue Mar 22, 2022 · 4 comments · Fixed by #247
Closed

Config does not support model_dim that does not match attention model_dim #244

stephantul opened this issue Mar 22, 2022 · 4 comments · Fixed by #247

Comments

@stephantul
Copy link
Contributor

🐛 Bug

Adding a dim_model within a position_encoding_config that is not equal to the dim_model of the rest of the model results in an unusable model, as the result of the embedding layer is no longer compatible with the feedforward layers within the attention blocks.

This looks like a bug to me, since it doesn't ever allow you to specify a dim_model in the vocabulary that is not equal to the dim_model of the global model. In my use-case, in which I would like to tie the input of embeddings of two different models with different hidden sizes, it would be nice to have this work by inserting a linear layer in between the embedding layer and the first attention layer.

Command

Using the following config results in a crash:

config = [{'reversible': False,
  'block_type': 'encoder',
  'num_layers': 4,
  'dim_model': 64,
  'layer_norm_style': 'pre',
  'position_encoding_config': {'name': 'vocab',
   'seq_len': 512,
   'vocab_size': 30522,
   'dim_model': 256},
  'multi_head_config': {'num_heads': 1,
   'residual_dropout': 0,
   'use_rotary_embeddings': True,
   'attention': {'name': 'scaled_dot_product',
    'dropout': 0.1,
    'causal': False,
    'seq_len': 512,
    'num_rules': 4}},
  'feedforward_config': {'name': 'MLP',
   'dropout': 0.1,
   'activation': 'gelu',
   'hidden_layer_multiplier': 4}}]

import torch
from xformers.factory.model_factory import xFormer, xFormerConfig

cfg = xFormerConfig(config)
model = xFormer.from_config(cfg)

X = torch.randint(0, 1000, (10, 256))

model(X)
RuntimeError: Given normalized_shape=[64], expected input with shape [*, 64], but got input of size[10, 256, 25
6]

To Reproduce

See above

Expected behavior

I'd expect the model builder to complain that adding a dim_model within a vocab block does not make sense, or I expect it to work.

@blefaudeux
Copy link
Contributor

Hi @stephantul, thanks for the report ! I agree with the error flagging, it's done down the line since the dimensions do not match but we could either catch that at config building time (a) or add an error message on top of this 'RuntimeError' (b) to make it a little easier to debug. I think that we should add a dimensionality check, option (a), feels like a relatively easy take.

Note that if I remember correctly you can just pass dim_model once, and the other fields are auto filled with this value.

On the other hand the architecture with an extra projection layer is not "standard" Transformer, I would label that as a feature, could be interesting indeed. Do you know if there are existing papers testing out this take, out of curiosity ?

@stephantul
Copy link
Contributor Author

Hi @blefaudeux,

Thanks for the quick response! I got the idea from this repo. In this electra implementation, the embeddings of the discriminator and the generator are tied, but the hidden dimension of the discriminator is 256, while the hidden size of the generator is 64. This size discrepancy is necessary for electra pretraining to work (i.e., the generator needs to be about 1/4 of the discriminator in size).

The repo I linked works with the huggingface implementation, which solves it by using an embeddings projector, see here.

I'm sure we can work around this somehow, but it would be a nice feature to perhaps add "projector layers" in between parts of the model that have a dimensionality mismatch when loading a config (maybe with an optional boolean flag to not let users accidentally add these projector layers?)

@blefaudeux
Copy link
Contributor

Sounds like a plan, I like the idea actually, seems like a good fit with the existing codebase, can be extended to do just that J believe

I can have a look later (time permitting) or feel free to submit a PR if you'd like

@stephantul
Copy link
Contributor Author

@blefaudeux I submitted a PR that automatically inserts projectors. Initially, I wrote it so it would add projects in between everything (i.e., mha, ff, ln), but that got a bit messy and needs a nicer solution, so I rolled back to this. I also think there is far less demand for mismatching dims between mha and ff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants