A CUDA implementation of Bundle Adjustment
This project implements a Bundle Adjustment algorithm with CUDA. It optimizes camera poses and landmarks (3D points) represented by a graph.
The reference CPU implementation is RainerKuemmerle/g2o. This project is designed to provide following g2o features, which are commonly used in Visual SLAM and SfM.
g2o::BlockSolver_6_3
g2o::OptimizationAlgorithmLevenberg
g2o::VertexSE3Expmap
g2o::VertexPointXYZ
g2o::EdgeSE3ProjectXYZ
g2o::EdgeStereoSE3ProjectXYZ
g2o::RobustKernelHuber
g2o::RobustKernelTukey
For example, see Use cuda-bundle-adjustment in ORB-SLAM2.
The performance obtained from sample/sample_comparison_with_g2o
is as follows.
Key | Value |
---|---|
CPU / implementation | Core-i7 6700K(4.00 GHz) / g2o |
GPU / implementation | GeForce GTX 1080 / cuda-bundle-adjustment |
number of iterations for optimization | 10 |
Input Filename | P | L | E | CPU[sec] | GPU[sec] |
---|---|---|---|---|---|
ba_kitti_07.json | 248 | 26127 | 95037 | 1.8 | 0.23 |
ba_kitti_00.json | 1332 | 133383 | 561116 | 11.9 | 1.23 |
P: number of poses, L: number of landmarks, E: number of edges
Some features supported in g2o are currently simplified or not implemented.
- Information matrix is represented by a scalar
- Camera parameters are associated with each of the pose vertices (not each of the edges)
- Robust kernel is applied uniformly for all monocular(stereo) edges
- Level optimization is not implemented
Package Name | Minimum Requirements | Note |
---|---|---|
CMake | version >= 3.18 | |
CUDA Toolkit | compute capability >= 6.0 | |
Eigen | version >= 3.2.0 | |
OpenCV | for sample | |
g2o | for sample, optional |
$ git clone https://github.com/fixstars/cuda-bundle-adjustment.git
$ cd cuda-bundle-adjustment
$ mkdir build
$ cd build
$ cmake .. # Several options available (e.g. -WITH_G2O=ON -DCUDA_ARCHS=86)
$ make
Option | Description | Default |
---|---|---|
ENABLE_SAMPLES | Build samples | ON |
WITH_G2O | Build sample with g2o | OFF |
USE_FLOAT32 | Use 32bit float in internal floating-point operations | OFF |
BUILD_SHARED_LIB | Build shared library | OFF |
CUDA_ARCHS | List of architectures to generate device code for | 61;72;75;86 |
With WITH_G2O
option, you can run sample/sample_comparison_with_g2o
.
g2o needs to be installed beforehand.
$ cmake -DWITH_G2O=ON ..
With USE_FLOAT32
option, 32bit float is used in internal floating-point operations (default is 64bit float).
Currently there is no significant speedup by this option.
$ cmake -DUSE_FLOAT32=ON ..
First, extract input graph files.
$ cd cuda-bundle-adjustment/samples
$ 7za x ba_input.7z
Input Filename | Description |
---|---|
ba_kitti_07.json | graph components sampled from KITTI sequences/07 using ORB-SLAM2 |
ba_kitti_00.json | graph components sampled from KITTI sequences/00 using ORB-SLAM2 |
Then, pass to the sample code.
$ cd cuda-bundle-adjustment/build
$ ./samples/sample_ba_from_file ../samples/ba_input/ba_kitti_00.json
output example of sample_ba_from_file
$ ./samples/sample_ba_from_file ../samples/ba_input/ba_kitti_00.json
Reading Graph... Done.
=== Graph size :
num poses : 1322
num landmarks : 133383
num edges : 561116
Running BA... Done.
=== Processing time :
BA total : 1.22[sec]
0: Initialize Optimizer : 67.9[msec]
1: Build Structure : 69.1[msec]
2: Compute Error : 11.0[msec]
3: Build System : 50.4[msec]
4: Schur Complement : 106.2[msec]
5: Symbolic Decomposition : 353.8[msec]
6: Numerical Decomposition : 554.5[msec]
7: Update Solution : 1.2[msec]
=== Objective function value :
iter: 1, chi2: 334210.0
iter: 2, chi2: 331822.8
iter: 3, chi2: 329700.4
iter: 4, chi2: 327743.4
iter: 5, chi2: 326123.2
iter: 6, chi2: 324876.6
iter: 7, chi2: 323698.5
iter: 8, chi2: 322572.7
iter: 9, chi2: 321410.3
iter: 10, chi2: 320086.4
output example of sample_comparison_with_g2o
$ ./samples/sample_comparison_with_g2o ../samples/ba_input/ba_kitti_00.json
Reading Graph... Done.
=== Graph size :
num poses : 1322
num landmarks : 133383
num edges : 561116
Running BA with CPU... Done.
Running BA with GPU... Done.
=== Processing time :
CPU : 11.93 [sec]
GPU : 1.23 [sec]
=== Objective function value :
iteration| chi2 CPU| chi2 GPU
1| 334210.0| 334210.0
2| 331822.8| 331822.8
3| 329700.4| 329700.4
4| 327743.4| 327743.4
5| 326123.2| 326123.2
6| 324876.6| 324876.6
7| 323698.5| 323698.5
8| 322572.7| 322572.7
9| 321410.3| 321410.3
10| 320086.4| 320086.4
=== RMSE between CPU estimates and GPU estimates :
Rotation : 7.63e-16
Translation : 4.50e-13
Landmark : 4.50e-13
The "adaskit Team"
The adaskit is an open-source project created by Fixstars Corporation and its subsidiary companies including Fixstars Autonomous Technologies, aimed at contributing to the ADAS industry by developing high-performance implementations for algorithms with high computational cost.
Apache License 2.0