tinystoriesGPT/README.md at main · vivek-rd/tinystoriesGPT · GitHub

Objective

The aim of this project is to familiarize myself with few things -

Training a GPT style decoder only model which generates grammatically correct sentences.
Learn how to use CUDA enabled environment for running PyTorch models.
Understand the ins and outs of transformer architecture.

TODO

Introduce wandb logging
Fix the attention masking bug in scaled dot produc attention
Add the space token in tiny tokenizer