Skip to content

ZhenyeLee/Capstone-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Monocular Depth Estimation Using a Deep Network

In this project, we learn an encoder-decoder with skip-connections proposed by Ibraheem Alhashim[1], and construct improved deep neural networks to predict the depth map from a single RGB image in an end-to-end way. The input of the network is RGB images, and after applying the network the output of this system is the corresponding estimated depth maps. The two datasets we use for training the model, NYU depth v2 and KITTI, contain RGB images and corresponding ground truth of depth maps.

Result

Demo for indoor scene in Engineering Hall, UW Madison

Demo for outdoor scene at State Street, Madison

Usage

If you want to run the whole program, just run one of .ipynb files on Colab that can directly download the entire NYU depth v2 dataset or the KITTI dataset.

Code Structure

NYU depth v2 Dataset

NYU/Initial_Model/ train.py trains a model for the NYU v2 dataset.
NYU/Initial_Model/ data.py reads and pre-processes the NYU v2 dataset.
NYU/Initial_Model/ loss.py contains loss functions.
NYU/Initial_Model/ model.py contains an encoder-decoder model for monocular depth estimation. This part is from model.py that can be download from https://github.com/ialhashim/DenseDepth/tree/master/PyTorch.

NYU/NYU_Standard_MonocularDepth.ipynb can be run directly on colab, and it will automatically download the entire NYU depth v2 dataset. The original encoder-decoder architecture is used in this .ipynb file.
NYU/NYU_addbatch_MonocularDepth.ipynb This is the modified encoder-decoder architecture adding BatchNormalization layer.
NYU/NYU_addup_MonocularDepth.ipynb This is the modified encoder-decoder architecture adding one more 2x upsampling layer.
NYU/NYU_simpledecoder_MonocularDepth.ipynb This is the modified encoder-decoder architecture without skip-connections.

KITTI Dataset

KITTI/kitti_equalloss_MonocularDepth.ipynb can be run directly on colab, and it will automatically download the KITTI dataset. This is the original encoder-decoder architecture with the equal-weighted loss function. In this .ipynb file, the modular of data reading is rewritten for the KITTI dataset.

Dataset

1. NYU Depth v2

The NYU Depth dataset v2 is composed of 464 indoor scenes with the resolution of 640x480 as recorded by both the RGB and Depth cameras from the Microsoft Kinect that can collect ground truth of depth directly.

2. KITTI

The KITTI is a large dataset that is composed of 56 outdoor scenes including the "city", "residential" categories of the raw data, and so on.

Small test dataset

The small test dataset for NYU-depth-v2 can be downloaded here.

Loss Function

The total loss Function are the weighted sum of three loss functions:

1. Ldepth(y, ŷ) compute point-wise L1 loss

2. Lgrad(y, ŷ) compute image gradient loss

3. LSSIM (y, ŷ)(Structural similarity index) compute the similarity of two images

Reference : https://ece.uwaterloo.ca/~z70wang/research/ssim/

Reference

[1] I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” arXiv:1812.11941, 2018.

About

Monocular Depth Estimation using a deep network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published