This repository contains code for 3D-RetinaNet, a novel Single-Stage action detection newtwork proposed along with ROAD-Waymo dataset and ROAD dataset. This code contains training and evaluation for ROAD-Waymo, ROAD and UCF-24 datasets.
We need three things to get started with training: datasets, kinetics pre-trained weight, and pytorch with torchvision and tensoboardX.
-
We currently support the following three dataset.
- ROAD-Waymo dataset
- ROAD dataset in dataset release paper
- UCF24 with revised annotations released with our ICCV-2017 paper.
-
Visit ROAD-Waymo dataset for download and pre-processing.
-
Visit ROAD dataset for download and pre-processing.
- Install Pytorch and torchvision
- INstall tensorboardX viad
pip install tensorboardx
- Pre-trained weight on kinetics-400. Download them by changing current directory to
kinetics-pt
and run the bash file get_kinetics_weights.sh. OR Download them from Google-Drive. Name the folderkinetics-pt
, it is important to name it right.
- We assume that you have downloaded and put dataset and pre-trained weight in correct places.
- To train 3D-RetinaNet using the training script simply specify the parameters listed in
main.py
as a flag or manually change them.
Let's assume that you extracted dataset in /home/user/road-waymo/
and weights in /home/user/kinetics-pt/
directory then your train command from the root directory of this repo is going to be:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py /home/user/ /home/user/ /home/user/kinetics-pt/ --MODE=train --ARCH=resnet50 --MODEL_TYPE=I3D --DATASET=road_waymo --TEST_DATASET=road_waymo --TRAIN_SUBSETS=train --VAL_SUBSETS=val --SEQ_LEN=8 --TEST_SEQ_LEN=8 --BATCH_SIZE=4 --LR=0.0041
Second instance of /home/user/
in above command specifies where checkpoint weight and logs are going to be stored. In this case, checkpoints and logs will be in /home/user/road-waymo/cache/<experiment-name>/
.
--ARCH ---> By default it's resent50 but our code also support resnet101
--MODEL_TYPE ---> We support six different models including I3D and SlowFast
--DATASET ---> Dataset specifiy the training dataset as we support multiple datasets including road, road_waymo, and roadpp (both combine)
--TEST_DATASET ---> Dataset use for evaluation in training MODE
--TRAIN_SUBSETS ---> It will be train in all cased except road where we have multiple splits
--SEQ_LEN ---> We did experiments for sequence length of 8 but we support other lenths as well
--TEST_SEQ_LEN ---> Test sequence length is for prediction of frames at a time we support mutliple lens and tested from 8 to 32.
--BATCH_SIZE ---> The batch size depends upon the number of GPUs and/or your GPU memory, if your GPU memory is 24 GB we recommend a batch per GPU. For A100 80GB of GPU we tested upto 5 batchs per GPU.
- Training notes:
- The VRAM required for a single batch is 16GB, in this case, you will need 4 GPUs (each with at least 16GB VRAM) to run training.
- During training checkpoint is saved every epoch also log it's frame-level
frame-mean-ap
on a subset of validation split test. - Crucial parameters are
LR
,MILESTONES
,MAX_EPOCHS
, andBATCH_SIZE
for training process. label_types
is very important variable, it defines label-types are being used for training and validation time it is bummed up by one withego-action
label type. It is created indata\dataset.py
for each dataset separately and copied toargs
inmain.py
, further used at the time of evaluations.- Event detection and triplet detection is used interchangeably in this code base.
To generate the tubes and evaluate them, first, you will need frame-level detection and link them. It is pretty simple in out case. Similar to training command, you can run following commands. These can run on single GPUs.
There are various MODEs
in main.py
. You can do each step independently or together. At the moment gen-dets
mode generates and evaluated frame-wise detection and finally performs tube building and evaluation.
For ROAD-Waymo dataset, run the following commands.
python main.py /home/user/ /home/user/ /home/user/kinetics-pt/ --MODE=gen_dets --MODEL_TYPE=I3D --DATASET=road_waymo --TEST_DATASET=road_waymo --VAL_SUBSETS=test --SEQ_LEN=8 --TEST_SEQ_LEN=8 --BATCH_SIZE=8 --LR=0.0041
--TEST_DATASET specifies the dataset on which the model should be tested. Our Baseline support cross datasets train and where where the model is train on one dataset and tested on other. Our Baseline also support training and testing on ROAD and ROAD waymo (ROADPP) together.
- Testing notes
- Evaluation can be done on single GPU for test sequence length up to 32
- Please go through the hypermeter in
main.py
to understand there functions. - After performing tubes a detection
.json
file is dumped, which is used for evaluation, seetubes.py
for more detatils. - See
modules\evaluation.py
anddata\dataset.py
for frame-level and video-level evaluation code to computeframe-mAP
andvideo-mAP
.
If this work has been helpful in your research please cite following articles:
@ARTICLE {singh2022road,
author = {Singh, Gurkirt and Akrigg, Stephen and Di Maio, Manuele and Fontana, Valentina and Alitappeh, Reza Javanmard and Saha, Suman and Jeddisaravi, Kossar and Yousefi, Farzad and Culley, Jacob and Nicholson, Tom and others},
journal = {IEEE Transactions on Pattern Analysis & Machine Intelligence},
title = {ROAD: The ROad event Awareness Dataset for autonomous Driving},
year = {5555},
volume = {},
number = {01},
issn = {1939-3539},
pages = {1-1},
keywords = {roads;autonomous vehicles;task analysis;videos;benchmark testing;decision making;vehicle dynamics},
doi = {10.1109/TPAMI.2022.3150906},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {feb}
}