Large Language Models (LLM) from Scratch in PyTorch

Simplified Scratch Pytorch Implementation of Large Language Models (LLM) with Detailed Steps (Refer to gpt.py and llama.py)

Overview:

Contains two models: GPT and LLAMA.
GPT model serves as the base simple decoder-only transformer and is easier to learn.
LLAMA contains advanced concepts: Rotational Positional Encoding (RoPe), SwishGLU, RMSNorm, Mixture of Experts, etc. (Refer below.)
These models are scaled-down versions of their original architectures.
Number of training parameters: 141k (GPT) and 423k (LLAMA). LLAMA has more training parameters due to a mixture of experts, but the inference cost is similar for both models.
Downloads the Taylor Swift song lyrics dataset by default for training.

Implementation Status (LLAMA):

✅ ByTe-Pair Tokenization [Here]
✅ Temperature, Top-p and Top-k [Here]
✅ RMSNorm [Here]
✅ SWiGLU [Here]
✅ Rotational Positional Embedding (RoPe) [Here]
✅ KV Cache [Here]
✅ Mixture of Experts [Here]
🔳 Grouped Query Attention
🔳 Infini Attention

Feel free to comment if you want anything integrated here.

Run command:

python main.py --network_type llama

The network can be selected between llama and gpt.
I have tested the model on Taylor Swift song lyrics.
By default, the Taylor Swift song lyrics dataset will be downloaded to the text file (default name: "data.txt").
To use a custom dataset, replace the file's content or provide a different text file using the data_file argument.

Sample Output

A sample output generated by my trained model:

You know you're not sure I can still see you speak, go And now I'm fallin' in love But I'm standin' in love [Pre-Chorus] So you got the rain one thing that I know What you were right here, right now But you're the one I want to say 'Cause you got a six back in your face I'm not happy and you say you've got a girl for me But you can tell me now that you're mine And all I'm just think I can solve them And I just wanna stay in that night

Results can be improved with more training data and a bigger model.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
README.md		README.md
dataloader.py		dataloader.py
get_data.py		get_data.py
gpt.py		gpt.py
llama.py		llama.py
main.py		main.py
solver.py		solver.py
tokenizer.py		tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models (LLM) from Scratch in PyTorch

Simplified Scratch Pytorch Implementation of Large Language Models (LLM) with Detailed Steps (Refer to gpt.py and llama.py)

Overview:

Implementation Status (LLAMA):

Run command:

Sample Output

Input Arguments of main.py

About

Releases

Packages

Languages

s-chh/PyTorch-Scratch-LLM

Folders and files

Latest commit

History

Repository files navigation

Large Language Models (LLM) from Scratch in PyTorch

Simplified Scratch Pytorch Implementation of Large Language Models (LLM) with Detailed Steps (Refer to gpt.py and llama.py)

Overview:

Implementation Status (LLAMA):

Run command:

Sample Output

Input Arguments of main.py

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages