GPT-2 Implementation from Scratch

A PyTorch implementation of GPT-2 with Flash Attention support. This implementation focuses on efficiency and readability while maintaining good performance.

Features

Flash Attention and traditional attention implementations
Configurable architecture (embedding size, heads, layers, etc.)
Checkpoint saving and loading
Training progress tracking
Memory efficient

Requirements

PyTorch
safetensors
CUDA-capable GPU (for Flash Attention)

Dataset

To download the training dataset (TinyShakespeare), run:

wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt

References

This implementation is inspired by Andrej Karpathy's nanoGPT, a minimal implementation of GPT-2 in PyTorch.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
gpt2.py		gpt2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 Implementation from Scratch

Features

Requirements

Dataset

References

About

Releases

Packages

Languages

ariaattar/gpt2.py

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Implementation from Scratch

Features

Requirements

Dataset

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages