Skip to content

Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.

Notifications You must be signed in to change notification settings

s-chh/PyTorch-Scratch-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Language Models (LLM) from Scratch in PyTorch

Simplified Scratch Pytorch Implementation of Large Language Models (LLM) with Detailed Steps (Refer to gpt.py and llama.py)

Overview:

  • Contains two models: GPT and LLAMA.
  • GPT model serves as the base simple decoder-only transformer and is easier to learn.
  • LLAMA contains advanced concepts: Rotational Positional Encoding (RoPe), SwishGLU, RMSNorm, Mixture of Experts, etc. (Refer below.)
  • These models are scaled-down versions of their original architectures.
  • Number of training parameters: 141k (GPT) and 423k (LLAMA). LLAMA has more training parameters due to a mixture of experts, but the inference cost is similar for both models.
  • Downloads the Taylor Swift song lyrics dataset by default for training.

Implementation Status (LLAMA):

✅ ByTe-Pair Tokenization [Here]
✅ Temperature, Top-p and Top-k [Here]
✅ RMSNorm [Here]
✅ SWiGLU [Here]
✅ Rotational Positional Embedding (RoPe) [Here]
✅ KV Cache [Here]
✅ Mixture of Experts [Here]
🔳 Grouped Query Attention
🔳 Infini Attention

Feel free to comment if you want anything integrated here.

Run command:

python main.py --network_type llama
  • The network can be selected between llama and gpt.
  • I have tested the model on Taylor Swift song lyrics.
  • By default, the Taylor Swift song lyrics dataset will be downloaded to the text file (default name: "data.txt").
  • To use a custom dataset, replace the file's content or provide a different text file using the data_file argument.

Sample Output

A sample output generated by my trained model:

You know you're not sure I can still see you speak, go And now I'm fallin' in love But I'm standin' in love [Pre-Chorus] So you got the rain one thing that I know What you were right here, right now But you're the one I want to say 'Cause you got a six back in your face I'm not happy and you say you've got a girl for me But you can tell me now that you're mine And all I'm just think I can solve them And I just wanna stay in that night

Results can be improved with more training data and a bigger model.

Input Arguments of main.py

About

Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (RoPe), SwishGLU, RMSNorm, Mixture of Experts (MOE). Tested on Taylor Swift song lyrics dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages