C++ implementation of the Tracking-Learning-Detection (TLD) framework for long-term, single-object tracking in a video stream, developed by Kalal et al [1]. As its name suggests, it has three main components: tracking, detection and learning. The tracker estimates object's motion from frame to frame. The detector localizes the object in each frame and if necessary re-initializes the tracker. The results of the tracking and detection are fused into a single result (a bounding box), and if the result is valid then the learning step is performed to estimate the detector's errors and update the dector.
- C++17
- OpenCV (>=4.2.0)
To build the project with cmake:
mkdir build
cd build
cmake ..
make
To run it (within the build/
directory):
./my_tld [--input] [--output] [--gt_bboxes] [--evaluate]
Options:
--input
string, input video path (or keyword "camera").--output
string, output video path (if not specified then no output is produces).--gt_bboxes
string, path to the file containing ground-truth bounding boxes.--evaluate
bool (1 or 0), whether to perform evaluation of the tracking or not (gt_bboxes
has to be provided).
Examples:
--input="../videos/input_video.mp4"
--input="../Dudek/img/%04d.jpg" --output="../output_video.mp4" --gt_bboxes="../Dudek/groundtruth_rect.txt" --evaluate=1
--input="cam" --output="../output_video.mp4"
For evaluation we used the tracking benchmark dataset: http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html
dancing_robot_out.mp4
Red bbox is the result of the cascade detector (after NMS), purple bbox of the median-flow tracker, green bbox of the fusion of the former two.
- Tracking-Learning-Detection, Z. Kalal, K. Mikolajczyk, J. Matas.
- Robust Object Tracking Based on Tracking-Learning-Detection, G. Nebehay.