Monocular Depth Estimation Using a Deep Network

In this project, we learn an encoder-decoder with skip-connections proposed by Ibraheem Alhashim[1], and construct improved deep neural networks to predict the depth map from a single RGB image in an end-to-end way. The input of the network is RGB images, and after applying the network the output of this system is the corresponding estimated depth maps. The two datasets we use for training the model, NYU depth v2 and KITTI, contain RGB images and corresponding ground truth of depth maps.

Result

Demo for indoor scene in Engineering Hall, UW Madison

Demo for outdoor scene at State Street, Madison

Usage

If you want to run the whole program, just run one of .ipynb files on Colab that can directly download the entire NYU depth v2 dataset or the KITTI dataset.

Code Structure

NYU depth v2 Dataset

NYU/Initial_Model/ train.py trains a model for the NYU v2 dataset.
NYU/Initial_Model/ data.py reads and pre-processes the NYU v2 dataset.
NYU/Initial_Model/ loss.py contains loss functions.
NYU/Initial_Model/ model.py contains an encoder-decoder model for monocular depth estimation. This part is from model.py that can be download from https://github.com/ialhashim/DenseDepth/tree/master/PyTorch.

NYU/NYU_Standard_MonocularDepth.ipynb can be run directly on colab, and it will automatically download the entire NYU depth v2 dataset. The original encoder-decoder architecture is used in this .ipynb file.
NYU/NYU_addbatch_MonocularDepth.ipynb This is the modified encoder-decoder architecture adding BatchNormalization layer.
NYU/NYU_addup_MonocularDepth.ipynb This is the modified encoder-decoder architecture adding one more 2x upsampling layer.
NYU/NYU_simpledecoder_MonocularDepth.ipynb This is the modified encoder-decoder architecture without skip-connections.

KITTI Dataset

KITTI/kitti_equalloss_MonocularDepth.ipynb can be run directly on colab, and it will automatically download the KITTI dataset. This is the original encoder-decoder architecture with the equal-weighted loss function. In this .ipynb file, the modular of data reading is rewritten for the KITTI dataset.

Dataset

1. NYU Depth v2

The NYU Depth dataset v2 is composed of 464 indoor scenes with the resolution of 640x480 as recorded by both the RGB and Depth cameras from the Microsoft Kinect that can collect ground truth of depth directly.

2. KITTI

The KITTI is a large dataset that is composed of 56 outdoor scenes including the "city", "residential" categories of the raw data, and so on.

Small test dataset

The small test dataset for NYU-depth-v2 can be downloaded here.

Loss Function

The total loss Function are the weighted sum of three loss functions:

1. Ldepth(y, ŷ) compute point-wise L1 loss

2. Lgrad(y, ŷ) compute image gradient loss

3. LSSIM (y, ŷ)(Structural similarity index) compute the similarity of two images

Reference : https://ece.uwaterloo.ca/~z70wang/research/ssim/

Algorithm :

Reference

[1] I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” arXiv:1812.11941, 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
KITTI		KITTI
NYU		NYU
Finalized_proposal.pdf		Finalized_proposal.pdf
Initial_proposal.pdf		Initial_proposal.pdf
README.md		README.md
capstone_proj_report.pdf		capstone_proj_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monocular Depth Estimation Using a Deep Network

Result

Demo for indoor scene in Engineering Hall, UW Madison

Demo for outdoor scene at State Street, Madison

Usage

Code Structure

NYU depth v2 Dataset

KITTI Dataset

Dataset

1. NYU Depth v2

2. KITTI

Small test dataset

Loss Function

1. Ldepth(y, ŷ) compute point-wise L1 loss

2. Lgrad(y, ŷ) compute image gradient loss

3. LSSIM (y, ŷ)(Structural similarity index) compute the similarity of two images

Algorithm :

Reference

About

Releases

Packages

Contributors 2

Languages

ZhenyeLee/Capstone-Project

Folders and files

Latest commit

History

Repository files navigation

Monocular Depth Estimation Using a Deep Network

Result

Demo for indoor scene in Engineering Hall, UW Madison

Demo for outdoor scene at State Street, Madison

Usage

Code Structure

NYU depth v2 Dataset

KITTI Dataset

Dataset

1. NYU Depth v2

2. KITTI

Small test dataset

Loss Function

1. Ldepth(y, ŷ) compute point-wise L1 loss

2. Lgrad(y, ŷ) compute image gradient loss

3. LSSIM (y, ŷ)(Structural similarity index) compute the similarity of two images

Algorithm :

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages