Skip to content

ameyakonk/Monocular-Depth-Map-Estimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Monocular-Depth-Map-Estimation

Overview

  1. Implemented the paper, High Quality Monocular Depth Estimation via Transfer Learning to construct a depth map using a single image frame of a particular scene.
  2. Incorporated augmentations into the image dataset to introduce and random horizontal flips during model training.
  3. The model consisted of an UNet architecture with Densenet169 as a pre-trained encoder and a custom decoder.
  4. Implemented a custom loss function incorporating edge loss, depth loss and Structural Similarity loss.
  5. Trained the model in PyTorch on a NYU2 Depth dataset and achieved an accuracy of 70%

Personnel

Ameya Konkar

UID:118191058

Master's Student at University of Maryland, College Park

Results

Predicted output Ground Truth
output1 gt1
Predicted output Ground Truth
output2 gt2

Model

Unet

The U-Net architecture stems from the so-called “fully convolutional network” first proposed by Long, Shelhamer, and Darrell.

The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information.

One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers. unet

Encoder

The Densenet-169 model is used as encoder. It is one of the DenseNet group of models designed to perform image classification. The main difference with the densenet-121 model is the size and accuracy of the model. In the unet, the densenet is pretrained using imagenet weights. The final dense layer of the encoder is not used as classification was not the objective of the project. DenseNet-169

Decoder

The decoder used in the unet is constructed by upsampling the outputs w.r.t to the encoder.

Loss function

The loss function consists of three kind of losses, edge loss, Structural similarity loss and overall depth loss between the predicted image and ground truth image.

Dependencies

  • Anaconda (Optional but recommended)
  • Python3
  • Matplotlib
  • Pandas
  • PyTorch (1.13.1 is used for development)
  • CUDA(optional)

Run instructions

Train model

''' cd <repository_name>/src python3 unet_train.py '''

Test model

''' cd <repository_name>/src python3 unet_test.py '''

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published