This thesis project proposes a data-driven, unsupervised framework for performing visual odometry, using multi-view imaging devices. In these experiments, a camera with 17 sub-apertures arranged in a cross-hair formation was used. The cameras are indexed as shown:
0
1
2
3
4 5 6 7 8 9 10 11 12
13 z
14 /
15 /_____ x
16 |
|
y
While the device in this case is equipped with 17 imagers, the proposed framework works with any number of cameras, including monocular imagery. The scripts used for training in this thesis project are provided in training_scripts
, and includes scripts for training on monocular and trinocular imagery, all imagers in the SU plane (horizontal) and the complete SUTV plane (the full light field).
The approach uses two CNN's:
- LFPoseNet: A network that accepts two frames of a lightfield video, separated by a small amount of time and space, and aims to predict the 6 degree-of-freedom ego-motion between them.
- LFDispNet: A network that accepts a single light field image, and aims to produce a disparity map corresponding to the center view.
These models are defined in lfmodels
.
The directory containing the dataset needs to be organised as following:
├── root
│ ├── seq1
│ │ ├── 0
│ │ │ ├── 000000.png
│ │ │ ├── 000001.png
│ │ │ ├── ...
│ │ │ └── 000372.png
│ │ ├── 1
│ │ │ ├──000000.png
│ │ │ ├──000001.png
│ │ │ ├── ...
│ │ │ └── 000372.png
│ │ ├── ...
│ │ ├── 16
│ │ ├── poses_gt_absolute.npy
│ │ └── poses_gt_relative.npy
│ └── seqN
├── train.txt
└── val.txt
poses_gt_absolute.npy
: serialised numpy array with ground truth pose data (in absolute pose, i.e. relative to origin)poses_gt_relative.npy
: same, with relative poses, relative to the previous frame.train.txt
: names of sequences to train on, each on new lineval.txt
: validation sequences
When a training script is run, outputs are generated in ~/Documents/checkpoints/EXPERIMENT_NAME
. The outputs are:
dispnet_best.pth.tar
: best scoring dispnet modelposenet_best.pth.tar
: best scoring posenet modelconfig.txt
andconfig.pkl
: human and machine readable configuration files containing parameters used for training.config.pkl
is loaded during inference to initialise the networks with the same parameters.- Progress logs in csv format
- Some tensorboard logs for visualisation
sh training-scripts/train-monocular.sh
sh training-scripts/train-trinocular.sh
sh training-scripts/train-horizontal.sh
usage: train.py [-h] [--save-path PATH] [--sequence-length N]
[--rotation-mode] [--padding-mode]
[-j] [--epochs] [-b] [--lr] [--momentum] [--beta]
[--weight-decay] [--print-freq] [--pretrained-disp]
[--pretrained-exppose] [--seed] [--log-summary]
[--log-full] [-p] [-m] [-s] [-g] [--log-output]
[-f] [-c] [--gray]
DIR NAME
positional arguments:
DIR path to dataset
NAME experiment name
optional arguments:
-h show this help message and exit
-b mini-batch size
-c list of cameras to use
-f frequency for outputting dispnet outputs
-g weight for ground truth pose supervision loss
-j number of data loading workers
-m weight for explainabilty mask loss
-p weight for photometric loss
-s weight for disparity smoothness loss
--lr initial learning rate
--save-path where to save outputs
--sequence-length sequence length for training
--rotation-mode rotation mode for PoseExpnet
--padding-mode padding mode for image warping
--epochs number of total epochs to run
--momentum momentum for sgd, alpha parameter for adam
--beta beta parameters for adam
--weight-decay weight decay
--print-freq print frequency
--pretrained-disp path to pre-trained dispnet model
--pretrained-exppose path to pre-trained Exp Pose net model
--seed seed for random functions
--log-summary csv where to save per-epoch train and valid stats
--log-full csv where to save per-gradient descent train stats
--log-output will log dispnet outputs at validation
--gray images are grayscale
Inference requires a 'config.pkl' file that was generated during training. This file contains the training parameters such as which cameras were used, whether images are grayscale, etc.
Inference generates both disparity maps and pose predictions. Disparity maps are saved as pngs in the the same directory as the model weights, in a new directory with the same name as the sequence. Inside there will also be a 'poses.npy' pickling of a numpy array with columns [tx, ty, tz, rx, ry, rz].
python3 infer.py --config PATH/TO/config.pkl --seq SEQUENCE_NAME
Use evaluation.ipynb for evaluating.