Homepage | Paper | Video | Benchmark Dataset
360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking
Huajian Huang, Yinzhe Xu, Yingshu Chen and Sai-Kit Yeung
The Hong Kong University of Science and Technology
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV2023)
The proposed 360VOT is the first benchmark dataset for omnidirectional visual object tracking. 360VOT contains 120
sequences with up to 113K high-resolution frames in equirectangular projection. It brings distinct challenges for tracking, e.g., crossing border (CB), large distortion (LD) and stitching artifact (SA). We explore new representations for visual object tracking and provide four types of unbiased ground truth, including bounding box (
To use the toolkit,
git clone https://github.com/HuajianUP/360VOT.git
cd 360VOT
pip install -r requirements.txt
Currently, the toolkit uses Rotated_IoU to calculate rBBox IoU
for evaluation. When you seek to evaluate the tracking results in term of rBBox
, you should install Rotated_IoU.
git submodule add https://github.com/lilanxiao/Rotated_IoU eval/Rotated_IoU
Please use the following structure to store tracking results. The subfolders tracker_n/
contain tracking results (.txt
files) of distinct methods on 120 sequences. The data format for different tracking representations in the txt file: BBox
is [x1 y1 w h]
, rBBox
is [cx cy w h rotation]
, BFoV
and rBFoV
are [clon clat fov_horizontal fov_vertical rotation]
. All the angle values are in a degree manner, e.g., 'BFoV=[0 30 60 60 10]'.
results
├── traker_1
│ ├── 0000.txt
│ ├── ....
│ └── 0120.txt
├── ....
│
└── traker_n
├── 0000.txt
├── ....
└── 0120.txt
For quick testing, you can download the benchmark results and unzip them in the folder benchmark/
. If you have not downloaded the 360VOT dataset, you also need to download the Dataset. Then, you can evaluate the BFoV results using the command:
python scripts/eval_360VOT.py -f benchmark/360VOT-bfov-results -d PATH_TO_360VOT_DATASET
Command Line Arguments for eval_360VOT.py
Args | Meaning |
---|---|
-d / --dataset_dir | Path to 360VOT dataset. |
-b / --bbox_dir | Specify the path to the bbox results when you evaluate the results in bbox. |
-rb / --rbbox_dir | Specify the path to the rbbox results when you evaluate the results in rbbox. |
-f / --bfov_dir | Specify the path to the bfov results when you evaluate the results in bfov. |
-rf / --rbbox_dir | Specify the path to the rbfov results when you evaluate the results in rbfov. |
-a / --attribute | Specify the path to the 360VOT_attribute.xlsx, when you evaluate the results regarding different attributes. |
-v / --show_video_level | Print metrics in detail. |
-p / --plot_curve | Plot the curves of metrics. |
-s / --save_path | Specify the path to save the figure of metrics. |
Command for visualizing the results and making a video:
# visualize the ground truth
python scripts/vis_result.py -d PATH_TO_DATASET -p PATH_TO_SAVE_VIDEOS [-ss VIDEO_OF_SPECIFIC_SEQUENCE]
# visualize the tracking results
python scripts/vis_result.py -d PATH_TO_DATASET -p PATH_TO_SAVE_VIDEOS -f PATH_TO_BFOV_RESULTS [-ss VIDEO_OF_SPECIFIC_SEQUENCE]
Command for checking parts of attributes of 360VOT dataset:
python scripts/check_360VOT_attribute.py --dir PATH_TO_DATASET [--excel PATH_TO_360VOT_attribute.xlsx]
The toolkit contains an essential library for processing 360o images. The operations include:
crop_bfov
: to extract the region of given (r)BFoV
from the 360o image.
plot_bfov
: to plot the region of given (r)BFoV
on the 360o image.
crop_bbox
: to extract the region of given (r)BBox
from the 360o image.
plot_bbox
: to plot the region of given (r)BBox
on the 360o image.
rot_image
: to rotate the image by the pitch, yaw, or roll angle.
localBbox2Bfov
: convert the (r)BBox
predictions on the extracted region to (r)BFoV
regarding the original 360o image.
localBbox2Bbox
: convert the (r)BBox
predictions on the extracted region to (r)BBox
regarding the original 360o image.
mask2Bfov
: estimate the (r)BFoV
from the masked images.
mask2Bbox
: estimate the (r)BBox
from the masked images.
For more examples, please refer to scripts/test_omni.py
We use a spherical camera model to formulate the relationship between the 2D image and the 3D camera coordinate system.
If you use 360VOT and this toolkit for your research, please reference:
@InProceedings{huang360VOT,
author = {Huajian Huang, Yinzhe Xu, Yingshu Chen and Sai-Kit Yeung},
title = {360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {}
}