CapeReviso Fork of tkDNN

This is a fork of tkDNN for the project CapeReviso.

Changes made

Added scripts, parameters and code to run it with IDS cameras as well as Logitech Brios using a systemd
implement continuous exposure correction
calculate privacy friendly background images
json output with unix time of each frame
queuing of images for for optimal performance in a live setup
end recording on day change, for having one file per day

Branches

main - still on TensorRT 7 for old devices
tensorrt8-branch - currect branch for new installed devices
deepsort - prototype for adding deepsort for subsequent tracking

tkDNN

tkDNN is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards. It has been tested on TK1(branch cudnn2), TX1, TX2, AGX Xavier, Nano and several discrete GPUs. The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training.

If you use tkDNN in your research, please cite the following paper. For use in commercial solutions, write at gattifrancesco@hotmail.it and micaela.verucchi@unimore.it or refer to https://hipert.unimore.it/ .

@inproceedings{verucchi2020systematic,
  title={A Systematic Assessment of Embedded Neural Networks for Object Detection},
  author={Verucchi, Micaela and Brilli, Gianluca and Sapienza, Davide and Verasani, Mattia and Arena, Marco and Gatti, Francesco and Capotondi, Alessandro and Cavicchioli, Roberto and Bertogna, Marko and Solieri, Marco},
  booktitle={2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)},
  volume={1},
  pages={937--944},
  year={2020},
  organization={IEEE}
}

What's new

20 July 2021

Support to sematic segmentation README
Support 2D/3D Object Detection and Tracking README

24 November 2021

Support to sematic segmentation on cuda 11
Support to TensorRT8.

TensorRT8 (and therefore Jetpack 4.6) is currently supported only on the branch tensort8 due to performance issue with TensorRT8). We will merge it to the master as soon as those issues are fixed (probably in future minor releases).

FPS Results

Inference FPS of yolov4 with tkDNN, average of 1200 images with the same dimension as the input size, on

RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5);
Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 );
Xavier NX, Jetpack 4.4 (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ).
Tx2, Jetpack 4.2 (CUDA 10.0, CUDNN 7.3.1, tensorrt 5.0.6 );
Jetson Nano, Jetpack 4.4 (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ).

Platform	Network	FP32, B=1	FP32, B=4	FP16, B=1	FP16, B=4	INT8, B=1	INT8, B=4
RTX 2080Ti	yolo4 320	118.59	237.31	207.81	443.32	262.37	530.93
RTX 2080Ti	yolo4 416	104.81	162.86	169.06	293.78	206.93	353.26
RTX 2080Ti	yolo4 512	92.98	132.43	140.36	215.17	165.35	254.96
RTX 2080Ti	yolo4 608	63.77	81.53	111.39	152.89	127.79	184.72
AGX Xavier	yolo4 320	26.78	32.05	57.14	79.05	73.15	97.56
AGX Xavier	yolo4 416	19.96	21.52	41.01	49.00	50.81	60.61
AGX Xavier	yolo4 512	16.58	16.98	31.12	33.84	37.82	41.28
AGX Xavier	yolo4 608	9.45	10.13	21.92	23.36	27.05	28.93
Xavier NX	yolo4 320	14.56	16.25	30.14	41.15	42.13	53.42
Xavier NX	yolo4 416	10.02	10.60	22.43	25.59	29.08	32.94
Xavier NX	yolo4 512	8.10	8.32	15.78	17.13	20.51	22.46
Xavier NX	yolo4 608	5.26	5.18	11.54	12.06	15.09	15.82
Tx2	yolo4 320	11.18	12.07	15.32	16.31	-	-
Tx2	yolo4 416	7.30	7.58	9.45	9.90	-	-
Tx2	yolo4 512	5.96	5.95	7.22	7.23	-	-
Tx2	yolo4 608	3.63	3.65	4.67	4.70	-	-
Nano	yolo4 320	4.23	4.55	6.14	6.53	-	-
Nano	yolo4 416	2.88	3.00	3.90	4.04	-	-
Nano	yolo4 512	2.32	2.34	3.02	3.04	-	-
Nano	yolo4 608	1.40	1.41	1.92	1.93	-	-

MAP Results

Results for COCO val 2017 (5k images), on RTX 2080Ti, with conf threshold=0.001

	CodaLab	CodaLab	CodaLab	CodaLab	tkDNN map	tkDNN map
	tkDNN	tkDNN	darknet	darknet	tkDNN	tkDNN
	MAP(0.5:0.95)	AP50	MAP(0.5:0.95)	AP50	MAP(0.5:0.95)	AP50
Yolov3 (416x416)	0.381	0.675	0.380	0.675	0.372	0.663
yolov4 (416x416)	0.468	0.705	0.471	0.710	0.459	0.695
yolov3tiny (416x416)	0.096	0.202	0.096	0.201	0.093	0.198
yolov4tiny (416x416)	0.202	0.400	0.201	0.400	0.197	0.395
Cnet-dla34 (512x512)	0.366	0.543	-	-	0.361	0.535
mv2SSD (512x512)	0.226	0.381	-	-	0.223	0.378

Index

tkDNN

Dependencies

This branch works on every NVIDIA GPU that supports the following (latest tested) dependencies:

CUDA 11.3 (or >= 10.2)
cuDNN 8.2.1 (or >= 8.0.4)
TensorRT 8.0.3 (or >=7.2)
OpenCV 4.5.4 (or >=4)
cmake 3.21 (or >= 3.15)
yaml-cpp 0.5.2
eigen3 3.3.4
curl 7.58

sudo apt install libyaml-cpp-dev curl libeigen3-dev

About OpenCV

To compile and install OpenCV4 with contrib us the script install_OpenCV4.sh. It will download and compile OpenCV in Download folder.

bash scripts/install_OpenCV4.sh

If you have OpenCV compiled with cuda and contrib and want to use it with tkDNN pass ENABLE_OPENCV_CUDA_CONTRIB=ON flag when compiling tkDBB . If the flag is not passed,the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency.

How to compile this repo

Build with cmake. If using Ubuntu 18.04 a new version of cmake is needed (3.15 or above). On both linux and windows ,the CMAKE_BUILD_TYPE variable needs to be defined as either Release or Debug.

git clone https://github.com/ceccocats/tkDNN
cd tkDNN
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -G"Ninja" .. 
ninja

Workflow

Steps needed to do inference on tkDNN with a custom neural network.

Build and train a NN model with your favorite framework.
Export weights and bias for each layer and save them in a binary file (one for layer).
Export outputs for each layer and save them in a binary file (one for layer).
Create a new test and define the network, layer by layer using the weights extracted and the output to check the results.
Do inference.

Exporting weights

For specific details on how to export weights see HERE.

Darknet Parser

tkDNN implement and easy parser for darknet cfg files, a network can be converted with tk::dnn::darknetParser:

// example of parsing yolo4
tk::dnn::Network *net = tk::dnn::darknetParser("yolov4.cfg", "yolov4/layers", "coco.names");
net->print();

All models from darknet are now parsed directly from cfg, you still need to export the weights with the descripted tools in the previus section.

Supported layers

convolutional maxpool avgpool shortcut upsample route reorg region yolo

Supported activations

relu leaky mish

Run the demo

This is an example using yolov4.

To run the an object detection first create the .rt file by running:

rm yolo4_fp32.rt        # be sure to delete(or move) old tensorRT files
./test_yolo4            # run the yolo test (is slow)

If you get problems in the creation, try to check the error activating the debug of TensorRT in this way:

cmake .. -DDEBUG=True
make

Once you have succesfully created your rt file, run the demo:

./demo yolo4_fp32.rt ../demo/yolo_test.mp4 y

In general the demo program takes 6 parameters:

./demo <network-rt-file> <path-to-video> <kind-of-network> <number-of-classes> <n-batches> <show-flag> <save-flag> <http-stream-port>

where

<network-rt-file> is the rt file generated by a test
<<path-to-video> is the path to a video file or a camera input
<kind-of-network> is the type of network. Thee types are currently supported: y (YOLO family), c (CenterNet family) and m (MobileNet-SSD family)
<number-of-classes>is the number of classes the network is trained on
<n-batches> number of batches to use in inference (N.B. you should first export TKDNN_BATCHSIZE to the required n_batches and create again the rt file for the network).
<show-flag> if set to 0 the demo will not show the visualization.
<save-flag> if set to 0 the results will be saved into result.mp4 (if n-batches ==1) and if set to 1 then the results will not be saved.
<http-stream-port> if set to 0 then http stream will be off but if port number is given eg: 8090 then the stream can seen the results in https://localhost:8090
<extyolo flag> if set to 1 then yolo like coordinates of the bounding boxes will be printed on the terminal

N.b. By default it is used FP32 inference

For specific details on how to run:

2D object detection demos, details on FP16, INT8 and batching see HERE.
segmentation demos see HERE.
2D/3D object detection and tracking demos see HERE.
mAP demo to evaluate 2D object detectors see HERE.

tkDNN on Windows 10 or Windows 11

For specific details on how to run tkDNN on Windows 10/11 see HERE.

Existing tests and supported networks

Test Name	Network	Dataset	N Classes	Input size	Weights
yolo	YOLO v2¹	COCO 2014	80	608x608	weights
yolo_224	YOLO v2¹	COCO 2014	80	224x224	weights
yolo_berkeley	YOLO v2¹	BDD100K	10	416x736	weights
yolo_relu	YOLO v2 (with ReLU, not Leaky)¹	COCO 2014	80	416x416	weights
yolo_tiny	YOLO v2 tiny¹	COCO 2014	80	416x416	weights
yolo_voc	YOLO v2¹	VOC	21	416x416	weights
yolo3	YOLO v3²	COCO 2014	80	416x416	weights
yolo3_512	YOLO v3²	COCO 2017	80	512x512	weights
yolo3_berkeley	YOLO v3²	BDD100K	10	320x544	weights
yolo3_coco4	YOLO v3²	COCO 2014	4	416x416	weights
yolo3_flir	YOLO v3²	FREE FLIR	3	320x544	weights
yolo3_tiny	YOLO v3 tiny²	COCO 2014	80	416x416	weights
yolo3_tiny512	YOLO v3 tiny²	COCO 2017	80	512x512	weights
dla34	Deep Leayer Aggreagtion (DLA) 34³	COCO 2014	80	224x224	weights
dla34_cnet	Centernet (DLA34 backend)⁴	COCO 2017	80	512x512	weights
mobilenetv2ssd	Mobilnet v2 SSD Lite⁵	VOC	21	300x300	weights
mobilenetv2ssd512	Mobilnet v2 SSD Lite⁵	COCO 2017	81	512x512	weights
resnet101	Resnet 101⁶	COCO 2014	80	224x224	weights
resnet101_cnet	Centernet (Resnet101 backend)⁴	COCO 2017	80	512x512	weights
csresnext50-panet-spp	Cross Stage Partial Network ⁷	COCO 2014	80	416x416	weights
yolo4	Yolov4 ⁸	COCO 2017	80	416x416	weights
yolo4_320	Yolov4 ⁸	COCO 2017	80	320x320	weights
yolo4_512	Yolov4 ⁸	COCO 2017	80	512x512	weights
yolo4_608	Yolov4 ⁸	COCO 2017	80	608x608	weights
yolo4_berkeley	Yolov4 ⁸	BDD100K	10	540x320	weights
yolo4tiny	Yolov4 tiny ⁹	COCO 2017	80	416x416	weights
yolo4x	Yolov4x-mish ⁹	COCO 2017
yolo4tiny_512	Yolov4 tiny ⁹	COCO 2017	80	512x512	weights
80	640x640	weights
yolo4x-cps	Scaled Yolov4 ¹⁰	COCO 2017	80	512x512	weights
shelfnet	ShelfNet18_realtime¹¹	Cityscapes	19	1024x1024	weights
shelfnet_berkeley	ShelfNet18_realtime¹¹	DeepDrive	20	1024x1024	weights
dla34_cnet3d	Centernet3D (DLA34 backend)⁴	KITTI 2017	1	512x512	weights
dla34_ctrack	CenterTrack (DLA34 backend)¹²	NuScenes 3D	7	512x512	weights

References

Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv:1911.11929 (2019).
Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv preprint arXiv:2004.10934 (2020).
Bochkovskiy, Alexey, "Yolo v4, v3 and v2 for Windows and Linux" (https://github.com/AlexeyAB/darknet)
Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "Scaled-YOLOv4: Scaling Cross Stage Partial Network." arXiv preprint arXiv:2011.08036 (2020).
Zhuang, Juntang, et al. "ShelfNet for fast semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019.
Zhou, Xingyi, Vladlen Koltun, and Philipp Krähenbühl. "Tracking objects as points." European Conference on Computer Vision. Springer, Cham, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 683 Commits
cmake		cmake
demo		demo
docker		docker
docs		docs
include/tkDNN		include/tkDNN
lib		lib
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
config_visagx-brio.ini		config_visagx-brio.ini
config_visagx-ids.ini		config_visagx-ids.ini
config_visagx1-brio.ini		config_visagx1-brio.ini
config_visagx6-brio.ini		config_visagx6-brio.ini
config_visagx6-ids.ini		config_visagx6-ids.ini
gst-capture		gst-capture
sample_calibration_image_configuration.ini		sample_calibration_image_configuration.ini
sample_config.ini		sample_config.ini
tkdnn-start-brio.sh		tkdnn-start-brio.sh
tkdnn-start-ids.sh		tkdnn-start-ids.sh
tkdnn-start.sh		tkdnn-start.sh
tkdnn-stop.sh		tkdnn-stop.sh
tkdnn.service		tkdnn.service
tkdnn.service.brio		tkdnn.service.brio
tkdnn.service.ids		tkdnn.service.ids

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CapeReviso Fork of tkDNN

Changes made

Branches

tkDNN

What's new

20 July 2021

24 November 2021

FPS Results

MAP Results

Index

Dependencies

About OpenCV

How to compile this repo

Workflow

Exporting weights

Darknet Parser

Run the demo

tkDNN on Windows 10 or Windows 11

Existing tests and supported networks

References

About

Releases

Packages

Contributors 2

Languages

License

hlrs-vis/tkDNN

Folders and files

Latest commit

History

Repository files navigation

CapeReviso Fork of tkDNN

Changes made

Branches

tkDNN

What's new

20 July 2021

24 November 2021

FPS Results

MAP Results

Index

Dependencies

About OpenCV

How to compile this repo

Workflow

Exporting weights

Darknet Parser

Run the demo

tkDNN on Windows 10 or Windows 11

Existing tests and supported networks

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages