The aim of this project is to familiarize myself with few things -
- Training a GPT style decoder only model which generates grammatically correct sentences.
- Learn how to use CUDA enabled environment for running PyTorch models.
- Understand the ins and outs of transformer architecture.
TODO
- Introduce wandb logging
- Fix the attention masking bug in scaled dot produc attention
- Add the space token in tiny tokenizer