blob# MTCNN_TensorRT
MTCNN Face detection algorithm's C++ implementation with NVIDIA TensorRT Inference acceleration SDK.
This repository is based on https://github.com/AlphaQi/MTCNN-light.git
2018/11/14: I have ported most of the computing to GPU using OpenCV CUDA warper and CUDA kernels wrote by myself. See branch all_gpu for more details, note that you need opencv 3.0+ built with CUDA support to run the projects. The speed is about 5-10 times faster on my GTX1080 GPU than master branch.
2018/10/2: Good news! Now you can run the whole MTCNN using TenorRT 3.0 or 4.0!
I adopt the original models from offical project https://github.com/kpzhang93/MTCNN_face_detection_alignment and do the following modifications: Considering TensorRT don't support PRelu layer, which is widely used in MTCNN, one solution is to add Plugin Layer (costome layer) but experiments show that this method breaks the CBR process in TensorRT and is very slow. I use Relu layer, Scale layer and ElementWise addition Layer to replace Prelu (as illustrated below), which only adds a bit of computation and won't affect CBR process, the weights of scale layers derive from original Prelu layers.
- OpenCV (on ubuntu just run sudo apt-get install libopencv-dev to install opencv)
- CUDA 9.0
- TensorRT 3.04 or TensorRT 4.16 (I only test these two versions)
- Cmake >=3.5
- A digital camera to run camera test.
- Replace the tensorrt and cuda path in CMakeLists.txt
- Configure the detection parameters in mtcnn.cpp (min face size, the nms thresholds , etc)
- Choose the running modes (camera test or single image test)
- cmake .
- make -j
- ./main
The result will be like this in single image test mode:
On my computer with nvidia-gt730 grapic card (its performance is very very poor) and intel i5 6500 cpu, when the min face-size is set to 60 pixels, the above image costs 20 to 30ms.
Inplement the whole processing using GPU computing.