VINS-Fusion-GPU-BA

This repository is a version of VINS-Fusion with a CUDA reimplementation of Bundle Adjustment.

Modifications are as follow :

The codes of Bundle Adjustment reimplementation are in vins_estimator/src/cuda_bundle_adjustment.
Estimator::optimization() in estimator.cpp is rewritten as follow :

void Estimator::optimization() {
    if(frame_count == WINDOW_SIZE) {
        optimization_with_cuda();  // solve and marginalize with cuda when the sliding window is full
    } else {
        optimization_with_ceres(); // solve with ceres when the sliding window is not yet full
    }
}

Dependencies

The essential software environment is as same as VINS-Fusion, tested on Ubuntu 18.04 & Ros Melodic.

While the Bundle Adjustment in estimator.cpp is reimplemented with CUDA and Eigen, this repository still requires ceres solver for non-linear optimization for :

Bundle Adjustment in estimator.cpp before frame_count turn to WINDOW_SIZE
GlobalSFM::construct() in initial_sfm.cpp
PoseGraph::optimize4DoF() in pose_graph.cpp.

Meanwhile, the CUDA reimplementation of Bundle Adjustment in estimator.cpp requires :

C++14
CUDA (>= 11.0)
CUBLAS
CUSOLVER
Eigen (>= 3.3.9)

How To Build

Before build this repo, some CMAKE variables in vins_estimator/src/cuda_bundle_adjustment/CMakeLists.txt need to be modified to fit your enviroment :

set(CMAKE_CUDA_COMPILER  /usr/local/cuda/bin/nvcc)        # set it to your path to nvcc
set(CUDA_TOOLKIT_ROOT_DIR  /usr/local/cuda/bin/nvcc)      # set it to your path to nvcc
set(CMAKE_CUDA_ARCHITECTURES  52)    # for example, if your device's compute capability is 6.2, then set it to 62

If your device's compute capability is >= 6.0, you can just change MyAtomicAdd() in vins_estimator/src/cuda_bundle_adjustment/cuda_kernel_funcs/device_utils.cu into the following :

template<typename T>
__device__ T MyAtomicAdd(T* address, T val) { return atomicAdd(address, val); }

Or, you can just replace MyAtomicAdd() with atomicAdd() wherever MyAtomicAdd() is called.

The basic steps to compile and run this repo is as same as VINS-Fusion.

Speed-up

Sequence	CPU (Intel I7-6700K)			GPU (Nvidia 980TI)
Sequence	iterations	solve	marginalization	solve (at least 10 iterations) & marginalization
MH_01_easy WINDOW_SIZE == 10 max feature count == 150	5.73 no RVIZ 5.73 with RVIZ	61.17ms no RVIZ 74.74ms with RVIZ	12.25ms no RVIZ 21.21ms with RVIZ	29.38ms no RVIZ 37.39ms with RVIZ
MH_05_difficult WINDOW_SIZE == 10 max feature count == 150	6.46 no RVIZ 6.46 with RVIZ	63.17ms no RVIZ 74.36ms with RVIZ	9.73ms no RVIZ 17.71ms with RVIZ	27.23ms no RVIZ 35.49ms with RVIZ
2011_10_03_drive_0027_sync WINDOW_SIZE == 10 max feature count == 200	4.71 no RVIZ 4.70 with RVIZ	18.79ms no RVIZ 19.81ms with RVIZ	6.07ms no RVIZ 7.18ms with RVIZ	19.98ms no RVIZ 22.27ms with RVIZ
MH_01_easy WINDOW_SIZE == 20 max feature count == 300	7.13 no RVIZ 7.13 with RVIZ	126.09ms no RVIZ 163.89ms with RVIZ	20.47ms no RVIZ 28.88ms with RVIZ	51.56ms no RVIZ 65.35ms with RVIZ
MH_05_difficult WINDOW_SIZE == 20 max feature count == 300	6.61 no RVIZ 6.61 with RVIZ	103.50ms no RVIZ 137.44ms with RVIZ	15.66ms no RVIZ 23.88ms with RVIZ	45.66ms no RVIZ 58.60ms with RVIZ
2011_10_03_drive_0027_sync WINDOW_SIZE == 20 max feature count == 400	4.94 no RVIZ 4.93 with RVIZ	44.14ms no RVIZ 46.91ms with RVIZ	9.22ms no RVIZ 11.99ms with RVIZ	34.18ms no RVIZ 38.40ms with RVIZ

Since the theoretical FP64 performance of Nvidia 980TI GPU (compute capability = 5.2) is only 189.4 GFLOPS, and atomicAdd() for FP64 is not available on devices with compute capability lower than 6.0, expecting a better speed-up with more recent hardwares is plausible.

P.S. It seems that RVIZ will largely slow down the speed of this reimplementation.

Precision

MH_01_easy (WINDOW_SIZE == 10, graphs are generated by evo) :

KITTI 2011_10_03_drive_0027_sync :

WINDOW_SIZE is 10
green path is generated by VO (estimator.cpp)
blue path is generated by fusing VO with GPS (globalOpt.cpp)

Some Implementation Details

Use Levenberg-Marquart to solve delta, at least 10 iterations.

Since the bottom right part (which corresponds to inverse depths) of the big hessian matrix is diagonal, a schur complement trick is used to solve the system states before solving the inverse depths

All jacobians, residuals, robust info matrices, hessian blocks, rhs blocks are computed on GPU.

No explicit big jacobian matrix and big residual vector is formed.

The big hessian matrix and rhs vector are formed this way :

Multiply tiny jacobian blocks and residual blocks to form tiny hessian blocks and rhs blocks inside kernel functions.
Add tiny hessian blocks and rhs blocks to the big hessian and rhs by calling atomicAdd() inside kernel functions.

In the original implementation with Ceres, a eigen decomposition is done in the process of marginalization to form a MarginalizationFactor for the next frame. This is because ceres-1.14 has to use jacobians to form hessians, instead of just taking formed hessians from API. Ceres-1.14 will also evaluates the error of MarginalizationFactor at new linearization points as part of the iteration strategy. However, in this reimplemetation, we don't use ceres and we ignore the error of MarginalizationFactor (we still use new linearization points to update hessian prior and rhs prior), so we don't need the eigen decomposition of the hessian prior (which usually take about 5~6ms on my Nvidia 980TI GPU) in the process of marginalization.

Limitations

Currently only support scenarios where the following conditions are all satisfied :

STEREO == true
ESTIMATE_TD == false
ESTIMATE_EXTRINSIC == false

Data type must be FP64. A version with FP32 is also implemented, however, it drifts away.

Acknowledgements

This repository is based upon VINS-Fusion. Also, many thanks to VINS-Course for its step-by-step demonstrations of how Bundle Adjustment works.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
camera_models		camera_models
config		config
docker		docker
global_fusion		global_fusion
loop_fusion		loop_fusion
plots		plots
support_files		support_files
vins_estimator		vins_estimator
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VINS-Fusion-GPU-BA

Dependencies

How To Build

Speed-up

Precision

Some Implementation Details

Limitations

Acknowledgements

Star History

About

Releases

Packages

Languages

License

qdLMF/VINS-Fusion-GPU-BA

Folders and files

Latest commit

History

Repository files navigation

VINS-Fusion-GPU-BA

Dependencies

How To Build

Speed-up

Precision

Some Implementation Details

Limitations

Acknowledgements

Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages