GPT-Reimagined: KANs vs MLPs

This repository, GPT Reimagined: KANs vs MLPs, contains an implementation of a Generative Pre-trained Transformer (GPT) model. The focus is to compare the performance and effectiveness of traditional multilayer perceptron (MLP) layers and Kolmogorov-Arnold Networks (KANs) in the architecture.

KANs are mathematical constructs based on the Kolmogorov-Arnold representation theorem, which suggests that any multivariable continuous function can be represented by a composition of univariate functions and summation. This unique approach can enable alternative network structures, potentially improving efficiency, expressiveness, or convergence rates for certain tasks.

Project Overview

Description

In this project, we aim to explore the effectiveness of Kolmogorov-Arnold Networks (KANs) as an alternative to traditional Multi Layer Perceptrons (MLPs) for implementing Generative Pretrained Transformers (GPTs). GPTs are a class of machine learning models known for their ability to generate natural language text and perform various natural language processing tasks. Traditionally, GPTs have been implemented using MLP architectures. However, KANs, a relatively new development, have shown promise in outperforming MLPs in certain tasks.

This project contributes to the ongoing research in machine learning architectures by providing empirical evidence on the efficacy of Kolmogorov-Arnold Networks as an alternative to traditional MLPs for implementing state-of-the-art language models like GPTs. The findings of this study can inform future developments in neural network architectures and guide the design of more efficient and effective models for natural language processing tasks.

Model: KAN-GPT architecture

Tech Stack

Category	Technologies
Programming Languages
Frameworks
Libraries
Datasets
Tools
Visualization & Analysis

Objectives

Implement GPT using the traditional MLP approach.
Implement GPT using Kolmogorov-Arnold Networks (KANs).
Compare the performance of GPT implemented with MLPs and KANs across various metrics, including but not limited to:
- Language generation quality
- Training speed
- Model size
- Resource utilization
Provide a proof of principle for the performances of MLP-based GPTs versus KAN-based GPTs.

Other mini-projects

Neural Network based on MNIST dataset
- Using MLP
- Using KAN
Fashion classifier using CNN
NameGPT
Masked Language Model using encoder
Language translation model using transformers

Installation

Clone the Repository:

git clone https://github.com/your-username/GPT-Reimagined.git
cd GPT-Reimagined

Download the Tiny Shakespeare or WikiText-2 Dataset: This is handled in dataset_shakespeare.py, which automatically downloads and tokenizes the dataset if it's not present.

Usage

Training the Model: Run the main script to train the models.
```
python main.py 
```
- Training details, including training and validation loss, are logged and saved for analysis in TensorBoard.
Generating Text: After training, you can generate text using the trained model:
```
python generate.py
```
- his will generate text based on a provided input prompt.
- Customize generate.py with desired configurations such as max_new_tokens to control the length of generated text.

Experiment Details

The goal is to evaluate the comparative performance between KANs and MLPs when used in transformer models. Key experimental configurations:

Block Size: 64 (number of tokens processed in a single pass)
Batch Size: 64
Learning Rate: 2e-5
Training Epochs: 6 ~ ( 9435 * 6 steps)
Loss Function: Cross-entropy for next-token prediction
Evaluation Metric: Validation loss and perplexity

Logging and Model Saving

Training progress is logged to TensorBoard.
Model checkpoints are saved in the models/ directory.

Results

Text Generation Quality: Generated text samples from both models reveal the qualitative differences in coherence and fluency in similar number of epochs and hyper-parameter.

Generated Results (KANs)

File Directory


GPT-Reimagined/
├── data/                          # Dataset (tiny Shakespeare data used here)
│   ├── tinyshakespeare/
│   │   ├── input.txt              # Encoded input data
│   │   ├── train.bin              # Encoded training data
│   │   ├── val.bin                # Encoded validation data
├── models/                        # Directory for saving trained models
├── logs/                          # Training logs for TensorBoard
├── archive_logs/                  # Archive of zipped logs
├── main.py                        # Main script to initiate training
├── dataset_shakespeare.py         # Data processing and loading script
├── model_kan.py                   # Kolmogorov-Arnold Network (KAN) model
├── model_mlp.py                   # MLP-based GPT model
├── train.py                       # Training loop for the models
├── config.py                      # Configuration for hyperparameters and paths
├── generate.py                    # Script for generating text with the trained model
├── utils.py                       # Utility functions
├── requirements.txt               # Required dependencies
└── README.md                      # This README file

Contributors

Kavya Rambhia
Abhay Upadhyay

Acknowledgement and Resources

Andrej Karpathy https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ
CMU Deep Learning lectures https://www.youtube.com/playlist?list=PLp-0K3kfddPzMmSaoGy5SqQsqhUeMC1ay
Research Paper-Attention Is All You Need https://arxiv.org/abs/1706.03762
Research Paper-Kolmogorov Arnold Networks https://arxiv.org/pdf/2404.19756
Special thanks to our mentors Param Thakkar and Mayank Palan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GPT-Reimagined: KANs vs MLPs

Table of Contents

Project Overview

Description

Model: KAN-GPT architecture

Tech Stack

Objectives

Other mini-projects

Installation

Usage

Experiment Details

Logging and Model Saving

Results

Generated Results (KANs)

File Directory

Contributors

Acknowledgement and Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

GPT-Reimagined: KANs vs MLPs

Table of Contents

Project Overview

Description

Model: KAN-GPT architecture

Tech Stack

Objectives

Other mini-projects

Installation

Usage

Experiment Details

Logging and Model Saving

Results

Generated Results (KANs)

File Directory

Contributors

Acknowledgement and Resources