Skip to content

fixstars/cuda-bundle-adjustment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cuda-bundle-adjustment

A CUDA implementation of Bundle Adjustment

Description

This project implements a Bundle Adjustment algorithm with CUDA. It optimizes camera poses and landmarks (3D points) represented by a graph.

The reference CPU implementation is RainerKuemmerle/g2o. This project is designed to provide following g2o features, which are commonly used in Visual SLAM and SfM.

  • g2o::BlockSolver_6_3
  • g2o::OptimizationAlgorithmLevenberg
  • g2o::VertexSE3Expmap
  • g2o::VertexPointXYZ
  • g2o::EdgeSE3ProjectXYZ
  • g2o::EdgeStereoSE3ProjectXYZ
  • g2o::RobustKernelHuber
  • g2o::RobustKernelTukey

For example, see Use cuda-bundle-adjustment in ORB-SLAM2.

Performance

The performance obtained from sample/sample_comparison_with_g2o is as follows.

Settings

Key Value
CPU / implementation Core-i7 6700K(4.00 GHz) / g2o
GPU / implementation GeForce GTX 1080 / cuda-bundle-adjustment
number of iterations for optimization 10

Results

Input Filename P L E CPU[sec] GPU[sec]
ba_kitti_07.json 248 26127 95037 1.8 0.23
ba_kitti_00.json 1332 133383 561116 11.9 1.23

P: number of poses, L: number of landmarks, E: number of edges

Limitations

Some features supported in g2o are currently simplified or not implemented.

  • Information matrix is represented by a scalar
  • Camera parameters are associated with each of the pose vertices (not each of the edges)
  • Robust kernel is applied uniformly for all monocular(stereo) edges
  • Level optimization is not implemented

Requirements

Package Name Minimum Requirements Note
CMake version >= 3.18
CUDA Toolkit compute capability >= 6.0
Eigen version >= 3.2.0
OpenCV for sample
g2o for sample, optional

How to build

$ git clone https://github.com/fixstars/cuda-bundle-adjustment.git
$ cd cuda-bundle-adjustment
$ mkdir build
$ cd build
$ cmake .. # Several options available (e.g. -WITH_G2O=ON -DCUDA_ARCHS=86)
$ make

CMake options

Option Description Default
ENABLE_SAMPLES Build samples ON
WITH_G2O Build sample with g2o OFF
USE_FLOAT32 Use 32bit float in internal floating-point operations OFF
BUILD_SHARED_LIB Build shared library OFF
CUDA_ARCHS List of architectures to generate device code for 61;72;75;86

With WITH_G2O option, you can run sample/sample_comparison_with_g2o. g2o needs to be installed beforehand.

$ cmake -DWITH_G2O=ON ..

With USE_FLOAT32 option, 32bit float is used in internal floating-point operations (default is 64bit float). Currently there is no significant speedup by this option.

$ cmake -DUSE_FLOAT32=ON ..

How to run samples

First, extract input graph files.

$ cd cuda-bundle-adjustment/samples
$ 7za x ba_input.7z
Input Filename Description
ba_kitti_07.json graph components sampled from KITTI sequences/07 using ORB-SLAM2
ba_kitti_00.json graph components sampled from KITTI sequences/00 using ORB-SLAM2

Then, pass to the sample code.

$ cd cuda-bundle-adjustment/build
$ ./samples/sample_ba_from_file ../samples/ba_input/ba_kitti_00.json
output example of sample_ba_from_file
$ ./samples/sample_ba_from_file ../samples/ba_input/ba_kitti_00.json

Reading Graph... Done.

=== Graph size :
num poses      : 1322
num landmarks  : 133383
num edges      : 561116

Running BA... Done.

=== Processing time :
BA total : 1.22[sec]

0: Initialize Optimizer        :     67.9[msec]
1: Build Structure             :     69.1[msec]
2: Compute Error               :     11.0[msec]
3: Build System                :     50.4[msec]
4: Schur Complement            :    106.2[msec]
5: Symbolic Decomposition      :    353.8[msec]
6: Numerical Decomposition     :    554.5[msec]
7: Update Solution             :      1.2[msec]

=== Objective function value :
iter:  1, chi2: 334210.0
iter:  2, chi2: 331822.8
iter:  3, chi2: 329700.4
iter:  4, chi2: 327743.4
iter:  5, chi2: 326123.2
iter:  6, chi2: 324876.6
iter:  7, chi2: 323698.5
iter:  8, chi2: 322572.7
iter:  9, chi2: 321410.3
iter: 10, chi2: 320086.4
output example of sample_comparison_with_g2o
$ ./samples/sample_comparison_with_g2o ../samples/ba_input/ba_kitti_00.json

Reading Graph... Done.

=== Graph size :
num poses      : 1322
num landmarks  : 133383
num edges      : 561116

Running BA with CPU... Done.

Running BA with GPU... Done.

=== Processing time :
CPU :   11.93 [sec]
GPU :    1.23 [sec]

=== Objective function value :
 iteration|  chi2 CPU|  chi2 GPU
         1|  334210.0|  334210.0
         2|  331822.8|  331822.8
         3|  329700.4|  329700.4
         4|  327743.4|  327743.4
         5|  326123.2|  326123.2
         6|  324876.6|  324876.6
         7|  323698.5|  323698.5
         8|  322572.7|  322572.7
         9|  321410.3|  321410.3
        10|  320086.4|  320086.4

=== RMSE between CPU estimates and GPU estimates :
Rotation    : 7.63e-16
Translation : 4.50e-13
Landmark    : 4.50e-13

Author

The "adaskit Team"

The adaskit is an open-source project created by Fixstars Corporation and its subsidiary companies including Fixstars Autonomous Technologies, aimed at contributing to the ADAS industry by developing high-performance implementations for algorithms with high computational cost.

License

Apache License 2.0