M4Depth

This is the reference TensorFlow implementation for training and testing M4Depth, the depth estimation method described in

Parallax Inference for Robust Temporal Monocular Depth Estimation in Unstructured Environments

Michaël Fonder, Damien Ernst and Marc Van Droogenbroeck

PDF file, Sensors website

Overview

M4Depth is deep neural network designed to estimate depth from RGB image sequences acquired in unknown environments by a camera moving with 6 degrees of freedom (DoF), and is:

Lightweight: M4Depth only requires 500MB of VRAM to run;
Real-time: M4Depth has a fast inference time that makes it compatible for real-time applications on most GPUs;
State-of-the-art: M4Depth is state of the art on the Mid-Air dataset and outperforms existing methods in a generalization setup on the TartanAir dataset, while having good performances on the KITTI dataset.

This network is the result of two major contributions:

We define a notion of visual parallax between two frames from a generic six-degree-of-freedom (6-DoF) camera motion, and present a way to build cost volumes with this parallax;
We use these cost volumes to build a novel lightweight multi-level architecture;

This repository contains all the material necessary to reproduce the results presented in the paper.

Samples produced by our method trained on Mid-Air when tested on Mid-Air (three leftmost columns) and when tested in generalization on TartanAir (two rightmost columns): the first line shows the RGB picture capured by the camera, the second the results produced by our method and the last one the ground-truth depth map.

Citation

If you use our work in your research, please consider citing our paper:

@article{Fonder2022Parallax,
  title     = {Parallax Inference for Robust Temporal Monocular Depth Estimation in Unstructured Environments},
  author    = {Fonder, Michael and Ernst, Damien and Van Droogenbroeck, Marc},
  journal   = {Sensors},
  volume    = {22},
  number    = {23},
  pages     = {1-22},
  month     = {November},
  year      = {2022},
  doi       = {10.3390/s22239374}
  url       = {https://doi.org/10.3390/s22239374}
}

Dependencies

Assuming a fresh Anaconda distribution install, you can install the dependencies required to run our code with:

conda install -c conda-forge tensorflow-gpu=2.7 numpy pandas

To follow the instructions given in the rest of this ReadMe, you also have to extract the content of the zip archives:

unzip '*.zip'

Datasets

Our set of experiments relies on three different datasets. Our code refers to the datasets_location.json configuration file to know where the data are located.

We provide scripts to download the datasets easily in the scripts directory, and detail how to use them hereunder. Please store each dataset in a disctinct directory to prevent issues with the scripts.

If you use our scripts to download the datasets, you won't have to edit the datasets_location.json configuration file. If you already have some datasets on your computer, you can edit the json file instead of downloading them. We provide information on how to do this hereunder.

Mid-Air [1]

To be able to download the Mid-Air dataset, you will need to get a file listing all the archives to download. For this, the procedure to follow is:

Go on the download page of the Mid-Air dataset

Select the "Left RGB" and "Stereo Disparity" image types

Move to the end of the page and enter your email to get the download links (the volume of selected data should be equal to 316.5Go)

When you have the file, execute our script to download and extract the dataset:

bash  scripts/0a-get_midair.sh path/to/desired/dataset/location path/to/download_config.txt

KITTI [2]

To automatically download and extract the KITTI dataset, you can simply run the following command:

bash  scripts/0b-get_kitti.sh path/to/desired/dataset/location

Tartanair [3]

To automatically download and extract the scenes from the TartanAir dataset used in our paper (gascola, season forest winter, neighborhood and old town), you can simply run the following command:

bash  scripts/0c-get_tartanair.sh path/to/desired/dataset/location

Editing `datasets_location.json` (facultative)

The datasets_location.json configuration file simply provides key-value pairs where:

the key is the name of one of the available datasets listed in the dataloaders/__init__.py file, and
the value stores a string that is the path pointing to the root directory of the data location for the corresponding dataset. This path can be relative or absolute.

If you use our script to download the datasets, you don't have to edit it. However, if you already have some datasets on your computer, you can edit the paths in the json file instead of downloading them again with our scripts.

Reproducing paper results

In the following subsections, we present the scripts that allow you to reproduce the experiments made with M4Depth as presented in the paper. These scripts assume that you downloaded the datasets as explained hereabove.

To benefit from the best code speed, you should compile the cuda implementation of the backprojection if you use a compatible GPU:

cd cuda_backproject
bash make.sh
cd ..

Training from scratch

To perform a training with the same setup as the one presented in our paper, you can use the following scripts and monitor the training with tensorboard. A cuda compatible GPU with at least 12GB of VRAM is required to train the network with the same parameters than in the paper.

Train on Mid-Air

bash  scripts/1a-train-midair.sh path/to/desired/weights/location

Finetune on KITTI:

bash  scripts/1b-finetune-kitti.sh path/to/desired/weights/location

Please note that the training can sometimes crash because of the apparition of NaNs in the weights of the network. If this happens, the training has to be manually resumed from the latest checkpoint.

Evaluation and Pretrained weights

You can compute the performance metrics of a trained version of M4Depth by using the following command line:

bash  scripts/2-evaluate.sh dataset path/to/weights/location

where the argument dataset can take the 4 following value: midair, kitti, tartanair-gascola, tartanair-winter, tartanair-neighborhood or tartanair-oldtown and where the second argument can be left blank.

If the second argument is not given, the evaluation will be done on the pretrained weights we provide. The resulting performance metrics should be the same as the ones reported in our paper.

Other operations

Command line given hereabove are simply preparametrizations of the parameters that can be given to the main.pypython script. Instead of using our bash scripts, you can directly call the python script and set the parameters as you want.

To see all the possible parameters and their use, you can request the python help:

python main.py --help

Processing outputs

You can visualize, save or postprocess the outputs of the network easily by implementing the code you want in the predict case of the main.py file.

Running on your own images

There is currently no easy one-line command that allows you to test M4Depth directly on your own data. If you want to test M4Depth on a custom dataset you need:

Generate the csv files mapping the frame location and frame-to-frame camera motion. The script scripts/midair-split-generator.py is the one we used to generate the csv files for Mid-Air in data/midair. You can adapt it for your own data.
Write the corresponding dataloader (it should inherit from the DataLoaderGeneric class, see dataloaders directory).
Add your dataset as a possible choice in the options for the --dataset command line argument.

Once this is done, you should be able to use your dataset with the main.py python script.

Baseline methods performance reproduction

We also provide the code used to get the performance of baseline methods on Mid-Air and TartanAir. This code can be found in the following GitHub repository: https://github.com/michael-fonder/M4Depth-Baselines

References

If you use one of the datasets in your research, please consider citing the related paper:

[1]
@inproceedings{Fonder2019MidAir,
  author    = {Fonder, Michael and Van Droogenbroeck, Marc},
  title     = {Mid-Air: A multi-modal dataset for extremely low altitude drone flights},
  booktitle = {IEEE International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year      = {2019},
  month     = {June}
}

[2]
@inproceedings{Geiger2012AreWe,
  title = {Are we ready for Autonomous Driving? {T}he {KITTI} Vision Benchmark Suite},
  author = {Geiger, Andreas and Lenz, Philip and Urtasun, Raquel},
  booktitle = {IEEE International Conference on Computer Vision (CVPR)},
  pages = {3354-3361},
  month = Jun,
  year = {2012}
}

[3]
@inproceedings{Wang2020TartanAir,
  title = {{TartanAir}: A Dataset to Push the Limits of Visual {SLAM}},
  author = {Wang, Wenshan and Zhu, Delong and Wang, Xiangwei and Hu, Yaoyu and Qiu, Yuheng and Wang, Chen and Hu, Yafei and Kapoor, Ashish and Scherer, Sebastian},
  booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages = {4909-4916},
  month = {October},
  year = {2020}
}

License

Our code is licensed under the AGPLv3 - See the LICENSE file for details.

The cuda implementation of the backprojection is not ours and is licensed under BSD 3 Clause License (see the corresponding LICENSE )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

M4Depth

Overview

Citation

Dependencies

Datasets

Mid-Air [1]

KITTI [2]

Tartanair [3]

Editing `datasets_location.json` (facultative)

Reproducing paper results

Training from scratch

Evaluation and Pretrained weights

Other operations

Processing outputs

Running on your own images

Baseline methods performance reproduction

References

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

M4Depth

Overview

Citation

Dependencies

Datasets

Mid-Air [1]

KITTI [2]

Tartanair [3]

Editing datasets_location.json (facultative)

Reproducing paper results

Training from scratch

Evaluation and Pretrained weights

Other operations

Processing outputs

Running on your own images

Baseline methods performance reproduction

References

License

Editing `datasets_location.json` (facultative)