Skip to content

Latest commit

 

History

History
140 lines (111 loc) · 8.73 KB

readme-2.md

File metadata and controls

140 lines (111 loc) · 8.73 KB

GPT-Reimagined: KANs vs MLPs

This repository, GPT Reimagined: KANs vs MLPs, contains an implementation of a Generative Pre-trained Transformer (GPT) model. The focus is to compare the performance and effectiveness of traditional multilayer perceptron (MLP) layers and Kolmogorov-Arnold Networks (KANs) in the architecture.

KANs are mathematical constructs based on the Kolmogorov-Arnold representation theorem, which suggests that any multivariable continuous function can be represented by a composition of univariate functions and summation. This unique approach can enable alternative network structures, potentially improving efficiency, expressiveness, or convergence rates for certain tasks.

Table of Contents

Project Overview

Description

In this project, we aim to explore the effectiveness of Kolmogorov-Arnold Networks (KANs) as an alternative to traditional Multi Layer Perceptrons (MLPs) for implementing Generative Pretrained Transformers (GPTs). GPTs are a class of machine learning models known for their ability to generate natural language text and perform various natural language processing tasks. Traditionally, GPTs have been implemented using MLP architectures. However, KANs, a relatively new development, have shown promise in outperforming MLPs in certain tasks.

This project contributes to the ongoing research in machine learning architectures by providing empirical evidence on the efficacy of Kolmogorov-Arnold Networks as an alternative to traditional MLPs for implementing state-of-the-art language models like GPTs. The findings of this study can inform future developments in neural network architectures and guide the design of more efficient and effective models for natural language processing tasks.

Model: KAN-GPT architecture

KAN-GPT architecture

Tech Stack

Category Technologies
Programming Languages Python
Frameworks PyTorch
Libraries scipy pandas numpy tqdm tiktoken
Datasets TinyShakespeare WikiText-2
Tools Git Google Colab Kaggle
Visualization & Analysis Matplotlib TensorBoard

Objectives

  • Implement GPT using the traditional MLP approach.
  • Implement GPT using Kolmogorov-Arnold Networks (KANs).
  • Compare the performance of GPT implemented with MLPs and KANs across various metrics, including but not limited to:
    • Language generation quality
    • Training speed
    • Model size
    • Resource utilization
  • Provide a proof of principle for the performances of MLP-based GPTs versus KAN-based GPTs.

Other mini-projects

  • Neural Network based on MNIST dataset
    • Using MLP
    • Using KAN
  • Fashion classifier using CNN
  • NameGPT
  • Masked Language Model using encoder
  • Language translation model using transformers

Installation

  1. Clone the Repository:
    git clone https://github.com/your-username/GPT-Reimagined.git
    cd GPT-Reimagined
  2. Download the Tiny Shakespeare or WikiText-2 Dataset: This is handled in dataset_shakespeare.py, which automatically downloads and tokenizes the dataset if it's not present.

Usage

  1. Training the Model: Run the main script to train the models.

    python main.py 
    • Training details, including training and validation loss, are logged and saved for analysis in TensorBoard.
  2. Generating Text: After training, you can generate text using the trained model:

    python generate.py
    • his will generate text based on a provided input prompt.
    • Customize generate.py with desired configurations such as max_new_tokens to control the length of generated text.

Experiment Details

The goal is to evaluate the comparative performance between KANs and MLPs when used in transformer models. Key experimental configurations:

  • Block Size: 64 (number of tokens processed in a single pass)
  • Batch Size: 64
  • Learning Rate: 2e-5
  • Training Epochs: 6 ~ ( 9435 * 6 steps)
  • Loss Function: Cross-entropy for next-token prediction
  • Evaluation Metric: Validation loss and perplexity

Logging and Model Saving

  • Training progress is logged to TensorBoard.
  • Model checkpoints are saved in the models/ directory.

Results

  • Text Generation Quality: Generated text samples from both models reveal the qualitative differences in coherence and fluency in similar number of epochs and hyper-parameter.

Generated Results (KANs)

kan-gpt-generated-text

File Directory


GPT-Reimagined/
├── data/                          # Dataset (tiny Shakespeare data used here)
│   ├── tinyshakespeare/
│   │   ├── input.txt
│   │   ├── train.bin              # Encoded training data
│   │   ├── val.bin                # Encoded validation data
├── models/                        # Directory for saving trained models
├── logs/                          # Training logs for TensorBoard
├── archive_logs/                  # Archive of zipped logs
├── main.py                        # Main script to initiate training
├── dataset_shakespeare.py         # Data processing and loading script
├── model_kan.py                   # Kolmogorov-Arnold Network (KAN) model
├── model_mlp.py                   # MLP-based GPT model
├── train.py                       # Training loop for the models
├── config.py                      # Configuration for hyperparameters and paths
├── generate.py                    # Script for generating text with the trained model
├── utils.py                       # Utility functions
├── requirements.txt               # Required dependencies
└── README.md                      # This README file

Contributors

Acknowledgement and Resources