Config does not support model_dim that does not match attention model_dim #244

stephantul · 2022-03-22T13:30:28Z

🐛 Bug

Adding a dim_model within a position_encoding_config that is not equal to the dim_model of the rest of the model results in an unusable model, as the result of the embedding layer is no longer compatible with the feedforward layers within the attention blocks.

This looks like a bug to me, since it doesn't ever allow you to specify a dim_model in the vocabulary that is not equal to the dim_model of the global model. In my use-case, in which I would like to tie the input of embeddings of two different models with different hidden sizes, it would be nice to have this work by inserting a linear layer in between the embedding layer and the first attention layer.

Command

Using the following config results in a crash:

config = [{'reversible': False,
  'block_type': 'encoder',
  'num_layers': 4,
  'dim_model': 64,
  'layer_norm_style': 'pre',
  'position_encoding_config': {'name': 'vocab',
   'seq_len': 512,
   'vocab_size': 30522,
   'dim_model': 256},
  'multi_head_config': {'num_heads': 1,
   'residual_dropout': 0,
   'use_rotary_embeddings': True,
   'attention': {'name': 'scaled_dot_product',
    'dropout': 0.1,
    'causal': False,
    'seq_len': 512,
    'num_rules': 4}},
  'feedforward_config': {'name': 'MLP',
   'dropout': 0.1,
   'activation': 'gelu',
   'hidden_layer_multiplier': 4}}]

import torch
from xformers.factory.model_factory import xFormer, xFormerConfig

cfg = xFormerConfig(config)
model = xFormer.from_config(cfg)

X = torch.randint(0, 1000, (10, 256))

model(X)

RuntimeError: Given normalized_shape=[64], expected input with shape [*, 64], but got input of size[10, 256, 25
6]

To Reproduce

See above

Expected behavior

I'd expect the model builder to complain that adding a dim_model within a vocab block does not make sense, or I expect it to work.

The text was updated successfully, but these errors were encountered:

blefaudeux · 2022-03-22T15:40:30Z

Hi @stephantul, thanks for the report ! I agree with the error flagging, it's done down the line since the dimensions do not match but we could either catch that at config building time (a) or add an error message on top of this 'RuntimeError' (b) to make it a little easier to debug. I think that we should add a dimensionality check, option (a), feels like a relatively easy take.

Note that if I remember correctly you can just pass dim_model once, and the other fields are auto filled with this value.

On the other hand the architecture with an extra projection layer is not "standard" Transformer, I would label that as a feature, could be interesting indeed. Do you know if there are existing papers testing out this take, out of curiosity ?

stephantul · 2022-03-22T18:05:06Z

Hi @blefaudeux,

Thanks for the quick response! I got the idea from this repo. In this electra implementation, the embeddings of the discriminator and the generator are tied, but the hidden dimension of the discriminator is 256, while the hidden size of the generator is 64. This size discrepancy is necessary for electra pretraining to work (i.e., the generator needs to be about 1/4 of the discriminator in size).

The repo I linked works with the huggingface implementation, which solves it by using an embeddings projector, see here.

I'm sure we can work around this somehow, but it would be a nice feature to perhaps add "projector layers" in between parts of the model that have a dimensionality mismatch when loading a config (maybe with an optional boolean flag to not let users accidentally add these projector layers?)

blefaudeux · 2022-03-22T18:42:01Z

Sounds like a plan, I like the idea actually, seems like a good fit with the existing codebase, can be extended to do just that J believe

I can have a look later (time permitting) or feel free to submit a PR if you'd like

stephantul · 2022-03-23T15:54:09Z

@blefaudeux I submitted a PR that automatically inserts projectors. Initially, I wrote it so it would add projects in between everything (i.e., mha, ff, ln), but that got a bit messy and needs a nicer solution, so I rolled back to this. I also think there is far less demand for mismatching dims between mha and ff.

stephantul mentioned this issue Mar 23, 2022

Add embedding projector for encoder and decoder #247

Merged

10 tasks

blefaudeux closed this as completed in #247 Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config does not support model_dim that does not match attention model_dim #244

Config does not support model_dim that does not match attention model_dim #244

stephantul commented Mar 22, 2022

blefaudeux commented Mar 22, 2022

stephantul commented Mar 22, 2022

blefaudeux commented Mar 22, 2022

stephantul commented Mar 23, 2022

Config does not support model_dim that does not match attention model_dim #244

Config does not support model_dim that does not match attention model_dim #244

Comments

stephantul commented Mar 22, 2022

🐛 Bug

Command

To Reproduce

Expected behavior

blefaudeux commented Mar 22, 2022

stephantul commented Mar 22, 2022

blefaudeux commented Mar 22, 2022

stephantul commented Mar 23, 2022