Transformers from scratch with PyTorch

A PyTorch implementation of the work "Attention is All You Need" with some goals:

Achieved:

An educational implementation (not a performant one)

Next steps:

Reproduce the results from the original paper

Requirements

Python 3.10.12
requirements.txt

Resources

These were the main resources I used to understand and implement the model.

Transformer Architecture:

Original Paper
Step-by-step guide into the architecture: The Illustrated Transformer
Implementation from scratch (w/o PE + forward expansion): Pytorch Transformers from Scratch (Attention is all you need)

Positional Encoding mechanism:

Understanding the mechanism: Master Positional Encoding Part 1
Implementation: Machine Learning Mastery