- Implemented the paper, High Quality Monocular Depth Estimation via Transfer Learning to construct a depth map using a single image frame of a particular scene.
- Incorporated augmentations into the image dataset to introduce and random horizontal flips during model training.
- The model consisted of an UNet architecture with Densenet169 as a pre-trained encoder and a custom decoder.
- Implemented a custom loss function incorporating edge loss, depth loss and Structural Similarity loss.
- Trained the model in PyTorch on a NYU2 Depth dataset and achieved an accuracy of 70%
UID:118191058
Master's Student at University of Maryland, College Park
Predicted output | Ground Truth |
---|---|
![]() |
![]() |
Predicted output | Ground Truth |
---|---|
![]() |
![]() |
The U-Net architecture stems from the so-called “fully convolutional network” first proposed by Long, Shelhamer, and Darrell.
The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information.
One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers.
The Densenet-169 model is used as encoder. It is one of the DenseNet group of models designed to perform image classification. The main difference with the densenet-121 model is the size and accuracy of the model. In the unet, the densenet is pretrained using imagenet weights. The final dense layer of the encoder is not used as classification was not the objective of the project.
The decoder used in the unet is constructed by upsampling the outputs w.r.t to the encoder.
The loss function consists of three kind of losses, edge loss, Structural similarity loss and overall depth loss between the predicted image and ground truth image.
- Anaconda (Optional but recommended)
- Python3
- Matplotlib
- Pandas
- PyTorch (1.13.1 is used for development)
- CUDA(optional)
''' cd <repository_name>/src python3 unet_train.py '''
''' cd <repository_name>/src python3 unet_test.py '''