Keywords: Python, PyTorch, Deep Learning, Image Segmentation
- Introduction
- Project structure
- Neural Network
3.1. Architecture
3.2. Hyperparameters - Results
- References
Disclaimer: this code is part of a bigger project and does not aim at being used alone.
The objective of this research is to develop a deep learning based algorithm to generate a superpixel partition with improved metrics. By combining an algorithm that generates superpixel partitions through the resolution of the Eikonal equation and ground truth segmentations from the COCO dataset, we were able to generate training examples of superpixel partitions of the images of the dataset. This convolutional network architecture is then trained on these images. A superpixel algorithm is finally applied to the output of the network to construct the seeked partition.
You can read the report for more information about this.
To install the project:
# clone the project
cd /path/to/project/
git clone https://github.com/theodumont/superpixels-segmentation.git
cd superpixels-segmentation/
# install requirements
pip install -r requirements.txt
The project superpixels-segmentation
has the following structure:
cnn/
: scripts for convolutional neural networksegm/
: info about superpixel segmentationother/
: dataset analysis and other scriptsreport/
: sources for report (.pdf
version can be found here)data/
: samples from the datasets used. More info in thedata/
folderpresentation/
: sources for public presentation (.pdf
version can be found here)
The primary architecture of our network is the Context Aggregation Network (CAN). It gradually aggregates contextual information without losing resolution through the use of dilated convolutions, whose field of view increases exponentially over the network layers. This exponential growth grants a global information aggregation with a compact structure. Please view the report for references.
Here is the architecture of our Context Aggregation Network:
Layer L_s |
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
Input w_s |
3 | 24 | 24 | 24 | 24 | 24 | 24 |
Output w_{s+1} |
24 | 24 | 24 | 24 | 24 | 24 | 3 |
Receptive field | 3x3 | 3x3 | 3x3 | 3x3 | 3x3 | 3x3 | 1x1 |
Dilation r_s |
1 | 2 | 4 | 8 | 16 | 1 | 1 |
Padding | 1 | 2 | 4 | 8 | 16 | 1 | 0 |
ABN | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
LReLU | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | No |
In the report, we discuss how the hyperparameters impact the model's performances metrics and temporal efficiency and we conduct experiments to find a good-performing architecture. We found that the following values worked well on the BSD dataset:
batch_size | epochs | d |
lr_0 |
decay for lr_0 |
alpha_TV |
---|---|---|---|---|---|
32 | 80 | 7 | 10^-2 | 10^-3 after 10 epochs | 0 |
The algorithm is evaluated on the Berkeley Segmentation Dataset 500. It yields results in terms of boundary adherence that are comparable to the ones obtained with state of the art algorithms including SLIC, while significantly improving on these algorithms in terms of compactness and undersegmentation.
Application of the model to an image of the BSD500. Original image (left) and superpixel segmented image with each superpixel being displayed with the average color of the pixels belonging to it (right).
Below are evaluated the metrics for some superpixel segmentation algorithms: SLIC, FMS and our algorithm (see report for references). We use the SLIC algorithm as a reference to evaluate the performances of our model.
Undersegm. Error | Compactness | Boundary Recall | |
---|---|---|---|
SLIC | .10 | .31 | .90 |
FMS | .05 | .48 | .89 |
Ours | .04 | .77 | .88 |
Comparisons of metrics on the BSD500 dataset. Values are for segmentations with 400 superpixels.
Our model yields very good results: the undersegmentation sees a 0.01
improvement, and the compactness is way better (improvement of 0.23
). The boundary recall is slightly smaller for our model than for the SLIC algorithm, but this is not a problem as the SLIC compactness is very low. The contours oscillate and thus intersect more with the ground truth image outlines.
See here for references.