arXiv | IEEE Xplore | website
This repository is the official implementation of the paper:
3D Multi-Object Tracking Using Graph Neural Networks with Cross-Edge Modality Attention
Martin Büchner and Abhinav Valada.
IEEE Robotics and Automation Letters (RA-L), Vol. 7, Issue 4, pp. 9707-9714, 2022
If you find our work useful, please consider citing our paper:
@article{buchner20223d,
title={3D Multi-Object Tracking Using Graph Neural Networks With Cross-Edge Modality Attention},
author={B{\"u}chner, Martin and Valada, Abhinav},
journal={IEEE Robotics and Automation Letters},
volume={7},
number={4},
pages={9707--9714},
year={2022},
publisher={IEEE}
}
Online 3D multi-object tracking (MOT) has witnessed significant research interest in recent years, largely driven by demand from the autonomous systems community. However, 3D offline MOT is relatively less explored. Labeling 3D trajectory scene data at a large scale while not relying on high-cost human experts is still an open research question. In this work, we propose Batch3DMOT that follows the tracking-by-detection paradigm and represents real-world scenes as directed, acyclic, and category-disjoint tracking graphs that are attributed using various modalities such as camera, LiDAR, and radar. We present a multi-modal graph neural network that uses a cross-edge attention mechanism mitigating modality intermittence, which translates into sparsity in the graph domain. Additionally, we present attention-weighted convolutions over frame-wise k-NN neighborhoods as suitable means to allow information exchange across disconnected graph components. We evaluate our approach using various sensor modalities and model configurations on the challenging nuScenes and KITTI datasets. Extensive experiments demonstrate that our proposed approach yields an overall improvement of 2.8% in the AMOTA score on nuScenes thereby setting a new benchmark for 3D tracking methods and successfully enhances false positive filtering.
- Download nuScenes dataset from here.
- Download Megvii and CenterPoint detections. You may use
src/utils/concat_jsons.py
to obtain mini-split results. - Define relevant paths in
*_config.yaml
- The
tmp
-folder holds preprocessed graph data while thedata
-folder holds the raw nuScenes dataset. - Adjust package paths to match your local setup.
- The
- Generate 2D image annotation by running
python nuscenes/scripts/export_2d_annotations_as_json.py --dataroot=/path/to/nuscdata --version=v1.0-trainval
and place it under the nuScenes data directory.
- Generate metadata and GT for feature encoder training:
python batch_3dmot/preprocessing/preprocess_img.py --config cl_config.yaml
python batch_3dmot/preprocessing/preprocess_lidar.py --config cl_config.yaml
,python batch_3dmot/preprocessing/preprocess_radar.py --config cl_config.yaml
- Train feature encoders:
python batch_3dmot/preprocessing/train_resnet_ae.py --config cl_config.yaml
python batch_3dmot/preprocessing/train_pointnet.py --config cl_config.yaml
python batch_3dmot/preprocessing/train_radarnet.py --config cl_config.yaml
- Construct disjoint, directed tracking graphs either using modalities or not:
python batch_3dmot/preprocessing/construct_detection_graphs_disjoint_parallel.py --config cl_config.yaml
python batch_3dmot/preprocessing/construct_detection_graphs_disjoint_parallel_only_poses.py --config pose_config.yaml
- Train Batch3DMOT (poses-only or using modalities):
python batch_3dmot/train_poses_only.py --config pose_config.yaml
python batch_3dmot/wandb_train.py --config cl_config.yaml
- Perform inference using trained model (poses-only, pose+camera or using more modalities):
python batch_3dmot/predict_detections_poses.py --config pose_config.yaml
python batch_3dmot/predict_detctions_img.py --config cl_config.yaml
python batch_3dmot/predict_detections.py --config cl_config.yaml
- Evaluate produced tracking result:
python batch_3dmot/eval/eval_nuscenes.py --config ***_config.yaml
For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.