A PyTorch implementation of the work "Attention is All You Need" with some goals:
Achieved:
- An educational implementation (not a performant one)
Next steps:
- Reproduce the results from the original paper
- Python 3.10.12
- requirements.txt
These were the main resources I used to understand and implement the model.
Transformer Architecture:
- Original Paper
- Step-by-step guide into the architecture: The Illustrated Transformer
- Implementation from scratch (w/o PE + forward expansion): Pytorch Transformers from Scratch (Attention is all you need)
Positional Encoding mechanism:
- Understanding the mechanism: Master Positional Encoding Part 1
- Implementation: Machine Learning Mastery