LRA Image Classification on Sequences of Pixels

Overview

This repository showcases the implementation and benchmarking of various machine learning models for image classification on the CIFAR-10 dataset. The focus of this project is inspired by the Long Range Arena (LRA) benchmark, where images are processed as sequences of pixels rather than as grids.

The primary goal is to evaluate how well sequence-processing models can handle image-based tasks, and to explore their potential advantages and limitations compared to traditional image-processing architectures.

Motivation

Traditional image classification models rely on convolutional neural networks (CNNs), which exploit the spatial structure of images. In this project, we explore sequence-based approaches:

Treating each pixel as part of a sequence.
Evaluating the capability of sequence models to handle image data.

The CIFAR-10 dataset:

60,000 images, each of size 32x32 pixels with 3 color channels (RGB).
Divided into 10 classes, with 6,000 images per class.

This task aligns with the LRA benchmark's objective of testing models on long-range dependency tasks.

Included Models

The following sequence-based models were benchmarked and some were implemented from scratch:

TrellisNet:
- Sequence modeling architecture combining features from recurrent and convolutional networks.
- Key Features:
  - Weight tying across time and layers.
  - Uses temporal convolutions for hierarchical feature learning.
- Performance:
  - Paper Accuracy: 73.42%
  - Our Work: 37%
LipschitzRNN:
- Enforces Lipschitz constraints for stability in training.
- Performance:
  - Paper Accuracy: 64.2%
  - Our Work: 40.5%
Pixel Transformer (PiT):
- Treats each pixel as a token, avoiding locality inductive biases.
- Architecture:
  - Linear projection of RGB values.
  - Transformer Encoder with Multihead Self-Attention.
- Performance:
  - Paper Accuracy: 85.1% (on CIFAR-100)
  - Our Work: 42.16%
Nyströmformer:
- Transformer variant that approximates self-attention using the Nyström method.
- Performance:
  - Paper Accuracy: 41.58%
  - Our Work: 23.09% (15.05% in individual implementation)
FNet:
- Replaces self-attention with Fourier Transform for faster token mixing.
- Performance:
  - Paper Accuracy: Not Available
  - Our Work: 46.02%

Repository Structure

Here is an overview of the repository: Each model directory contains:

Implementation scripts: For training and evaluation.
Preprocessing details: Steps for preparing the CIFAR-10 dataset.
Guidelines: Specific instructions to reproduce results.

How to Use

Installation

Clone the repository:

git clone https://github.com/iboud/LRA-Image-Classification-on-Sequences-of-Pixels.git
cd LRA-Image-Classification-on-Sequences-of-Pixels

Note: No required dependencies.

Running Models

Each model folder contains details explaining:

Dataset preparation.
Training and evaluation steps.
Hyperparameter configurations.

Results and Benchmarking

The following table summarizes the experimental results for each model:

Model	Paper Accuracy	Our Accuracy
TrellisNet	73.42%	37%
LipschitzRNN	64.2%	40.5%
PiT	85.1% (C100)	42.16%
Nyströmformer	41.58%	23.09%
FNet	NA	46.02%

Challenges and Limitations

Resource Constraints:
- Limited GPU memory restricted model size and batch processing.
- High computational demands, especially for models like PiT and Nyströmformer.
Accuracy Gaps:
- Some models showed significant differences from reported paper accuracies, likely due to hardware and training duration limitations.
Dataset Representation:
- Processing CIFAR-10 as pixel sequences increases sequence lengths, adding complexity for transformer-based models.

Learnings and Takeaways

Sequence models can handle image classification but face challenges with long sequence lengths (e.g., PiT, Nyströmformer).
Trade-offs exist between computational efficiency and accuracy for different architectures.
Fine-tuning hyperparameters and leveraging larger datasets could further improve results.

References

Contributors

This project was carried out by:

Imane Rahali
Amine Idrissi
Ilyas Boudhaine

Supervised by:

Hamza Alami
Issam Ait Yahia

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
FNet		FNet
LipschitzRNN		LipschitzRNN
Nystromformer		Nystromformer
PiT		PiT
TrellisNet		TrellisNet
Data_visualisation.ipynb		Data_visualisation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LRA Image Classification on Sequences of Pixels

Overview

Motivation

Included Models

Repository Structure

How to Use

Installation

Running Models

Results and Benchmarking

Challenges and Limitations

Learnings and Takeaways

References

Contributors

About

Releases

Packages

Contributors 3

Languages

iboud0/LRA-Image-Classification-on-Sequences-of-Pixels

Folders and files

Latest commit

History

Repository files navigation

LRA Image Classification on Sequences of Pixels

Overview

Motivation

Included Models

Repository Structure

How to Use

Installation

Running Models

Results and Benchmarking

Challenges and Limitations

Learnings and Takeaways

References

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages