Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 455 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 455 Bytes

Objective

The aim of this project is to familiarize myself with few things -

  1. Training a GPT style decoder only model which generates grammatically correct sentences.
  2. Learn how to use CUDA enabled environment for running PyTorch models.
  3. Understand the ins and outs of transformer architecture.

TODO

  • Introduce wandb logging
  • Fix the attention masking bug in scaled dot produc attention
  • Add the space token in tiny tokenizer