Skip to content

Latest commit

 

History

History
110 lines (76 loc) · 5.91 KB

README.md

File metadata and controls

110 lines (76 loc) · 5.91 KB

Application of Pre-training Strategies on Landslides Detection

Winter 2024 CS291K (Machine Learning and Data Mining) course project

plot

In this project, we investigated the application of pre-training strategies to improve landslides detection using deep learning. We employed the Faster R-CNN framework with transfer learning on satellite imagery datasets and explored the following pre-training strategies:

  • Image Classification
  • Knowledge Distillation
  • Masked Autoencoder (MAE)

By utilizing various pre-trained models and techniques, we achieved improvements in landslides bounding box detection on satellite images.

💡 For environment setup, please see: Implementation Details Section

🔎 For further reading, please see: our final report

Introduction: Landslides Detection

Landslides have affected about 5 million people worldwide.

Existing deep-learning approaches either require training from scratch or use natural image pre-trained weights to initialize the model.

Although the use of different pre-training strategies (e.g., MAE) with satellite imagery has been extensively studied, their effectiveness in landslide detection remains unexplored.

Methods

Stage 1: Pre-training

  • Objective: Pre-train image encoders using different strategies
  • Datasets:
  • Architecture (image encoder):
    • CNN-based: ResNet-18 and EfficientNet-B0
    • Transformer-based: ViT and Swin Transformer
  • Image Encoder Pre-training Strategies
    • Image Classification:
      • train an image encoder with the ImageNet dataset to predict the image category
    • Knowledge Distillation: distilled a more complex model into our smaller, task-specific image encoder:
      1. load the ImageNet-pre-trained weights onto a predetermined teacher model;
      2. fine-tune the teacher model using the Landslide4Sense dataset on binary landslides image classification;
      3. freeze the teacher model's weights;
      4. train a smaller student model to predict the teacher model's soft target probabilities along with the ground-truth hard labels
    • Masked Autoencoder:
      • mask input image patches and train an encoder-decoder framework to reconstruct the original image

Stage 2: Fine-tuning

  • Objective: Landslide object bounding box detection
  • Datasets: Landslide4Sense
  • Architecture:
    • Backbone: image encoder pretrained during Stage 1
    • Head: Faster R-CNN

Evaluation

A detection is considered correct if Intersection Over Union (IoU) ≥ predefined threshold.

Results

We first investigated the performance of various backbone architectures on landslide object detection. The pre-training strategy was fixed to be ImageNet-1K image classification for all architectures. We found that the Swin-Base outperformed other architectures.

plot

We then selected the three best-performing architectures (i.e., ViT-Large, Swin-Tiny, and Swin-Base) and investigated the effect of various pre-training strategies on further improving the models' performance for this landslide detection task. Across all dataset-strategy-model combinations, the Swin-Base pre-trained using MAE yielded the best performance.

plot

Sample prediction results from our best-performing model (Swin-Base pre-trained using MAE):

plot

Conclusion

We presented a comprehensive approach to landslide detection using deep-learning techniques, focusing on using pre-trained image encoder architectures within the Faster R-CNN framework.

We found that the Swin-Base architecture, pre-trained using Masked Autoencoder (MAE) yielded the best performance in detecting landslides within satellite imagery.

Our findings highlighted the importance of selecting appropriate pre-training strategies and backbone architectures for improving landslide detection performance.

About the Authors

This project was part of the CS291K: Machine Learning and Data Mining course. Study design and code implementation were done by me (Yuchen Hou) and Vihaan Akshaay Rajendiran.

Implementation Details

Environment Setup

cd 291k
conda env create -f environment.yml
conda activate 291k

Model weights:

Datasets:

Code adapted from:

Let's Connect!

📧 Yuchen Hou | GitHub | LinkedIn | Webpage

☕ I'm always happy to chat about research ideas, potential collaborations, or anything you're passionate about!