Skip to content

shuxg2017/tensorflow_transformer_encodingLayer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Encoder

  • BERT is powerful, and it was trained for nlp downstream tasks, but BERT is very large and not very fast to process the data.
  • The benifits of building transformer encoder from scratch is that we can understand deeply how the mechanism works and customize BERT with fewer parameters.
  • Code of transformer encoder is available here.
  • This notebook shows how to use the package. You can also modify the code for your own understanding.
  • Note: there are 2 classes, the codes of EncoderLayers and EncoderModel are the same, BUT
    • EncoderLayers inherits from tf.keras.layers.Layer
    • EncoderModel inherits from tf.keras.Model
  • I made the following graphs to show how to build your own encoding layer. You can follow along the code with these graphs.
    (Good Luck! :P)

MultiHeadAttention Mechanism (MHA)

MHA
I forgot to put attention weights in the tf.nn.softmax().
"z" is the context matrix and it needs to be transposed and reshaped. QKV


Encoding Layer

Enc

About

Build transformer encoder from scratch and show how to use it.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published