Skip to content

ariaattar/gpt2.py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

GPT-2 Implementation from Scratch

A PyTorch implementation of GPT-2 with Flash Attention support. This implementation focuses on efficiency and readability while maintaining good performance.

Features

  • Flash Attention and traditional attention implementations
  • Configurable architecture (embedding size, heads, layers, etc.)
  • Checkpoint saving and loading
  • Training progress tracking
  • Memory efficient

Requirements

  • PyTorch
  • safetensors
  • CUDA-capable GPU (for Flash Attention)

Dataset

To download the training dataset (TinyShakespeare), run:

wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt                            

References

This implementation is inspired by Andrej Karpathy's nanoGPT, a minimal implementation of GPT-2 in PyTorch.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages