Road Segmentation is a deep learning project designed to detect and mask roads in aerial satellite images. Using the UNet architecture, the model accurately identifies roads from the background. This finds applications in urban planning, navigation, geographic analysis or military purposes.
This project leverages the UNet architecture to perform semantic segmentation of roads in high-resolution aerial satellite images. The objective is to predict binary masks where road pixels are labeled as 1 and the background as 0. This technology has versatile use cases, from urban planning to autonomous navigation systems.
The dataset used for this project includes:
- Satellite Images: High-resolution RGB satellite images.
- Ground Truth Masks: Binary masks where road pixels are labeled as white (1) and background pixels as black (0).
Each training sample consists of an image and its corresponding ground truth mask. The dataset can be downloaded from AIcrowd.
-
Dataset Preparation:
- Download and extract the dataset.
- You have to put all extracted folder into a new folder epfl_road-segmentation and the files has to be organized into the following structure:
epfl_road-segmentation/ |-- training/ |-- test_set_images/
- Please keep this folder in the same directory as your project.
- Update data_root to the path to epfl_road-segmentation folder.
-
Dependencies: We trained our model using PyTorch. To display the images and plots we used matplotlib. We also used tqdm to have a little progress bar during the training phase. To install the required Python libraries please use the following commands:
!pip install torch torchvision matplotlib tqdm
-
Google Drive Access (Optional): We used the GPU from Google Colab, which allows faster computation speed than CPU. To use Google Colab, please upload the dataset to your Google Drive and mount it, and set the data_root as the path to the folder in your Google Drive:
from google.colab import drive drive.mount('/content/drive') data_root = '/content/drive/MyDrive/.../epfl_road-segmentation' # Update with your dataset path
The project utilizes the UNet architecture, a popular model for semantic segmentation tasks. Key features include:
- Encoder-Decoder Structure: Captures high-level features and reconstructs them into pixel-level predictions.
- Skip Connections: Preserve spatial information from the encoder for better localization.
- Regularization: Batch normalization and dropout layers to reduce overfitting.
Run the cells in the Data Preprocessing section to prepare the dataset:
- The dataset will be split into training, validation, and test sets.
- Data augmentation (e.g., rotation, flipping, Gaussian blur) will be applied to the training set. Augmented images will be saved with the naming convention
satImage_*_augmented.png
. - Please do not modify files name and ensure the images and their corresponding ground truth masks are correctly aligned.
Train the model by running the cells in the Model Training section:
- Monitor the training and validation loss, overall accuracy (OA), and F1-score after each epoch.
- The F1-score is calculated based on the Intersection over Union (IoU).
Run the cells in the Model Testing section to predict road masks for new images:
- The model will use unseen test images to generate their segmentation masks.
- It will also compare the predictions with the groundtruth and compute the F1-score.
After training, the model achieved the following metrics on the validation set:
- IoU (over the whole validation set): 80%
- Pixel Accuracy: 95.5%
- Validation Loss: 0.12
Below are some predicitons from the test set images.