Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 1.77 KB

File metadata and controls

24 lines (19 loc) · 1.77 KB

Transformer Encoder

  • BERT is powerful, and it was trained for nlp downstream tasks, but BERT is very large and not very fast to process the data.
  • The benifits of building transformer encoder from scratch is that we can understand deeply how the mechanism works and customize BERT with fewer parameters.
  • Code of transformer encoder is available here.
  • This notebook shows how to use the package. You can also modify the code for your own understanding.
  • Note: there are 2 classes, the codes of EncoderLayers and EncoderModel are the same, BUT
    • EncoderLayers inherits from tf.keras.layers.Layer
    • EncoderModel inherits from tf.keras.Model
  • I made the following graphs to show how to build your own encoding layer. You can follow along the code with these graphs.
    (Good Luck! :P)

MultiHeadAttention Mechanism (MHA)

MHA
I forgot to put attention weights in the tf.nn.softmax().
"z" is the context matrix and it needs to be transposed and reshaped. QKV


Encoding Layer

Enc