A project where we fine-tune on fineweb-edu10B with a replica of the GPT 2 model with the hyperparameters from GPT 3 and the tokenizer from GPT 4. Trained on 8 A100 SXM4 (40GB) GPUs on Lambda Labs for one epoch, approx. 3 hours training time.
This project is inspired by
- Let's build GPT: from scratch, in code, spelled out.
- Let's build the GPT Tokenizer
- Let's reproduce GPT-2 (124M)
by Andrej Karpathy on YouTube.