Clarification on Architecture #27

bjascob · 2022-06-03T21:08:09Z

Reading the original paper, I took it that RETRO was a standard transformer (ie.. 12 layer encoder, 12 layer decoder) augmented with a DB retrieval system that included a second smaller (2 layer) encoder for the Frozen Bart encoded neighbors, where the 2 layer encoder was sort of a translator between the Bart model and the main transformer.

Looking at the model here, it looks like there is only the 2 layer retrieval encoder and not a full-size main encoder. Is that correct?

Going back and re-reading the paper it doesn't seem to explicitly say one way or the other. It seems odd to me that the model would only have the 2 layer retrieval encoder. Not only would this mean that the encoder is only 2 layers but it also means that most decoder layers have no standard cross attention to the encoder, only layers 6, 9, 12 with the new CCA setup.

Has anyone trained the model from this repo and demonstrated that it can produce the results from the original paper?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Architecture #27

Clarification on Architecture #27

bjascob commented Jun 3, 2022 •

edited

Loading

Clarification on Architecture #27

Clarification on Architecture #27

Comments

bjascob commented Jun 3, 2022 • edited Loading

bjascob commented Jun 3, 2022 •

edited

Loading