Real-Time Deep Learning Model (ENet) for Semantic Segmentation on the CamVid Dataset and Road Segmentation on IDD-lite using SOTA segmentation models with pretrained MobileNet-v2 backbone.

This repository implements a Real-time deep learning model(ENet) for semantic segmentation on the CamVid Dataset and Road segmentation (binary segmentation: foreground Road) on the IDD-lite dataset using the models found in Segmentations-model-PyTorch Library with pretrained MobileNet-v2 backbone.

Code is developed using PyTorch Lightning, WandB logger, Segmentations-model-PyTorch on the lightning.ai/Kaggle platform.

There are 3 branches:

Main branch: Scripts to train only the Encoder.
Decoder attachment: Scripts for full Network training.
Idd-lite Segmentation: Road segmentation (binary segmentation: foreground Road) on the IDD-lite dataset, implemented using PyTorch Lightning and the WandB API (for hyperparameter tuning and experiment tracking).

ENet Development

This is an original PyTorch Lightning implementation for ENet, accounting for the original phased training procedure outlined in the paper(The encoder part of the network is separately trained on the downsampled ground truth, then copying the weights to the composite network for full training on the decoder size ground truth.)

The reference for ENet was the original code written by the author in Lua language and Python implementations available online(links are available in the notebook).

As the original code was implemented using 4 TitanX GPUs, total replication can be done using Pytorch lightning's distributed training option on a multi-GPU device. For this implementation to carry out semantic segmentation on a single GPU on CamVid, additional settings should be done on Batchnorm layers(use_running_stats=False) which will make the network use current batch statistics instead of precomputed values of the train set.

Road Segmentation on Indian Driving Dataset lite

Problem Statement for Road Segmentation:

Perform pixel-level classification to distinguish drivable roads from images taken from monocular cameras and output binary mask.

Methodology for Road Segmentation

Experiment Tracking using WandB

Hyperparameter tuning

Loss function analysis

Soft Binary Cross Entropy Loss combined with Dice loss was chosen after analysis

Trying out different Segmentation Heads

Output on Jetsontx2

ENet output on Jetson TX2 could be achieved without any additional optimization, inference spped was 61.2ms and accuracy for 12 class segmentation using a model trained on current batch statistics of CamVid lite dataset is 66.87 %:

The trained models for Road segmentation were exported to ONNX and optimized using TensorRT. Workflow for carrying out inference on JetsonTX2 using TensorRT is show below

Ground Truth and predictions for Road Segmentation

Inference statistics for the Road Segmentation models deployed on Jetson TX2 is shown below:

Inference applied to a video file can be viewed here.

References

Complete details on my thesis can be found in here

Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147.

G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,” in 2019 IEEE winter conference on applications of computer vision (WACV), pp. 1743–1751, IEEE, 2019.

https://github.com/e-lab/ENet-training

https://github.com/iArunava/ENet-Real-Time-Semantic-Segmentation

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
lightning_logs		lightning_logs
wandb		wandb
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.036.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.036.pth
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.050.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.050.pth
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.071.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.071.pth
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.075.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.075.pth
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.106.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.106.pth
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.141.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.141.pth
CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.176.pth		CNNEncoder_for_ENet_trained_on_Camvid_epoch0_acc0.176.pth
DL_FOR_ROAD_SEGMENTATION_IN_INDIAN_CONTEXT.pdf		DL_FOR_ROAD_SEGMENTATION_IN_INDIAN_CONTEXT.pdf
MANet.png		MANet.png
PANet.png		PANet.png
PSPNet.png		PSPNet.png
README.md		README.md
best_bce+dice.png		best_bce+dice.png
callbacks.py		callbacks.py
config.py		config.py
dataset.py		dataset.py
deeplabv3.png		deeplabv3.png
enet-camvid-encoder-training.ipynb		enet-camvid-encoder-training.ipynb
fpn.png		fpn.png
linknet.png		linknet.png
methodology.png		methodology.png
metrics.py		metrics.py
model.py		model.py
model_1.onnx		model_1.onnx
model_3.onnx		model_3.onnx
onnx-to-tensorrt(1).ipynb		onnx-to-tensorrt(1).ipynb
out_msk.png		out_msk.png
pathnames.txt		pathnames.txt
prob_statement.png		prob_statement.png
requirements.txt		requirements.txt
speed_vs_accuracy.png		speed_vs_accuracy.png
test_per_model.png		test_per_model.png
train.py		train.py
trtdeeplabv3.png		trtdeeplabv3.png
trtlinknet.png		trtlinknet.png
trtmanet.png		trtmanet.png
trtpan.png		trtpan.png
trtunet_unorm.png		trtunet_unorm.png
unet++.png		unet++.png
unet_scse.png		unet_scse.png
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Deep Learning Model (ENet) for Semantic Segmentation on the CamVid Dataset and Road Segmentation on IDD-lite using SOTA segmentation models with pretrained MobileNet-v2 backbone.

ENet Development

Road Segmentation on Indian Driving Dataset lite

Problem Statement for Road Segmentation:

Methodology for Road Segmentation