Real-Time Deep Learning Model (ENet) for Semantic Segmentation on the CamVid Dataset and Road Segmentation on IDD-lite using SOTA segmentation models with pretrained MobileNet-v2 backbone.
This repository implements a Real-time deep learning model(ENet) for semantic segmentation on the CamVid Dataset and Road segmentation (binary segmentation: foreground Road) on the IDD-lite dataset using the models found in Segmentations-model-PyTorch Library with pretrained MobileNet-v2 backbone.
Code is developed using PyTorch Lightning, WandB logger, Segmentations-model-PyTorch on the lightning.ai/Kaggle platform.
There are 3 branches:
- Main branch: Scripts to train only the Encoder.
- Decoder attachment: Scripts for full Network training.
- Idd-lite Segmentation: Road segmentation (binary segmentation: foreground Road) on the IDD-lite dataset, implemented using PyTorch Lightning and the WandB API (for hyperparameter tuning and experiment tracking).
This is an original PyTorch Lightning implementation for ENet, accounting for the original phased training procedure outlined in the paper(The encoder part of the network is separately trained on the downsampled ground truth, then copying the weights to the composite network for full training on the decoder size ground truth.)
The reference for ENet was the original code written by the author in Lua language and Python implementations available online(links are available in the notebook).
As the original code was implemented using 4 TitanX GPUs, total replication can be done using Pytorch lightning's distributed training option on a multi-GPU device. For this implementation to carry out semantic segmentation on a single GPU on CamVid, additional settings should be done on Batchnorm layers(use_running_stats=False) which will make the network use current batch statistics instead of precomputed values of the train set.
Perform pixel-level classification to distinguish drivable roads from images taken from monocular cameras and output binary mask.
Soft Binary Cross Entropy Loss combined with Dice loss was chosen after analysis
ENet output on Jetson TX2 could be achieved without any additional optimization, inference spped was 61.2ms and accuracy for 12 class segmentation using a model trained on current batch statistics of CamVid lite dataset is 66.87 %:
The trained models for Road segmentation were exported to ONNX and optimized using TensorRT. Workflow for carrying out inference on JetsonTX2 using TensorRT is show below
Ground Truth and predictions for Road Segmentation
Inference statistics for the Road Segmentation models deployed on Jetson TX2 is shown below:
Inference applied to a video file can be viewed here.
Complete details on my thesis can be found in here
Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147.
G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments,” in 2019 IEEE winter conference on applications of computer vision (WACV), pp. 1743–1751, IEEE, 2019.
https://github.com/e-lab/ENet-training
https://github.com/iArunava/ENet-Real-Time-Semantic-Segmentation