Monocular-Depth-Map-Estimation

Overview

Implemented the paper, High Quality Monocular Depth Estimation via Transfer Learning to construct a depth map using a single image frame of a particular scene.
Incorporated augmentations into the image dataset to introduce and random horizontal flips during model training.
The model consisted of an UNet architecture with Densenet169 as a pre-trained encoder and a custom decoder.
Implemented a custom loss function incorporating edge loss, depth loss and Structural Similarity loss.
Trained the model in PyTorch on a NYU2 Depth dataset and achieved an accuracy of 70%

Personnel

Ameya Konkar

UID:118191058

Master's Student at University of Maryland, College Park

Results

Predicted output	Ground Truth

Predicted output	Ground Truth

Model

Unet

The U-Net architecture stems from the so-called “fully convolutional network” first proposed by Long, Shelhamer, and Darrell.

The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information.

One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers.

Encoder

The Densenet-169 model is used as encoder. It is one of the DenseNet group of models designed to perform image classification. The main difference with the densenet-121 model is the size and accuracy of the model. In the unet, the densenet is pretrained using imagenet weights. The final dense layer of the encoder is not used as classification was not the objective of the project.

Decoder

The decoder used in the unet is constructed by upsampling the outputs w.r.t to the encoder.

Loss function

The loss function consists of three kind of losses, edge loss, Structural similarity loss and overall depth loss between the predicted image and ground truth image.

Dependencies

Anaconda (Optional but recommended)
Python3
Matplotlib
Pandas
PyTorch (1.13.1 is used for development)
CUDA(optional)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
csv		csv
models		models
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monocular-Depth-Map-Estimation

Overview

Personnel

Ameya Konkar

Results

Model

Unet

Encoder

Decoder

Loss function

Dependencies

Run instructions

Train model

Test model

About

Releases

Packages

Languages

ameyakonk/Monocular-Depth-Map-Estimation

Folders and files

Latest commit

History

Repository files navigation

Monocular-Depth-Map-Estimation

Overview

Personnel

Ameya Konkar

Results

Model

Unet

Encoder

Decoder

Loss function

Dependencies

Run instructions

Train model

Test model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages