Preview.mp4
We provide a novel audio-visual mouse saliency (AViMoS) dataset with the following key-features:
- Diverse content: movie, sports, live, vertical videos, etc.;
- Large scale: 1500 videos with mean 19s duration;
- High resolution: all streams are FullHD;
- Audio track saved and played to observers;
- Mouse fixations from >5000 observers (>70 per video);
- License: CC-BY;
File structure:
-
Videos.zip
— 1500 (1000 Train + 500 Test) .mp4 video (kindly reminder: many videos contain an audio stream and users watched the video with the sound turned ON!) -
TrainTestSplit.json
— in this JSON we provide Train/Public Test/Private Test split of all videos -
SaliencyTrain.zip/SaliencyTest.zip
— almost losslessly (crf 0, 10bit, min-max normalized) compressed continuous saliency maps videos for Train/Test subset -
FixationsTrain.zip/FixationsTest.zip
— contains the following files for Train/Test subset:
-
.../video_name/fixations.json
— per-frame fixations coordinates, from which saliency maps were obtained, this JSON will be used for metrics calculation -
.../video_name/fixations/
— binary fixation maps in '.png' format (since some fixations could share the same pixel, this is a lossy representation and is NOT used either in calculating metrics or generating Gaussians, however, we provide them for visualization and frames count checks)
VideoInfo.json
— meta information about each video (e.g. license)
conda create -n saliency python=3.8.16
conda activate saliency
pip install numpy==1.24.2 opencv-python==4.7.0.72 tqdm==4.65.0
conda install ffmpeg=4.4.2 -c conda-forge
Archives with videos were accepted from challenge participants as submissions and scored using the same pipeline as in bench.py
.
Usage example:
- Check that your predictions match the structure and names of the baseline CenterPrior submission
- Install
pip install -r requirments.txt
,conda install ffmpeg
- Download and extract
SaliencyTest.zip
,FixationsTest.zip
, andTrainTestSplit.json
files from the dataset page - Run
python bench.py
with flags:
--model_video_predictions ./SampleSubmission-CenterPrior
— folder with predicted saliency videos--model_extracted_frames ./SampleSubmission-CenterPrior-Frames
— folder to store prediction frames (should not exist at launch time), requires ~170 GB of free space--gt_video_predictions ./SaliencyTest/Test
— folder from dataset page with gt saliency videos--gt_extracted_frames ./SaliencyTest-Frames
— folder to store ground-truth frames (should not exist at launch time), requires ~170 GB of free space--gt_fixations_path ./FixationsTest/Test
— folder from dataset page with gt saliency fixations--split_json ./TrainTestSplit.json
— JSON from dataset page with names splitting--results_json ./results.json
— path to the output results json--mode public_test
— public_test/private_test subsets
- The result you get will be available following
results.json
path
Please cite the paper if you find challenge materials useful for your research:
@inproceedings{aim2024vsp,
title={{AIM} 2024 Challenge on Video Saliency Prediction: Methods and Results},
author={Andrey Moskalenko and Alexey Bryncev and Dmitry Vatolin and Radu Timofte and Gen Zhan and Li Yang and Yunlong Tang and Yiting Liao and Jiongzhi Lin and Baitao Huang and Morteza Moradi and Mohammad Moradi and Francesco Rundo and Concetto Spampinato and Ali Borji and Simone Palazzo and Yuxin Zhu and Yinan Sun and Huiyu Duan and Yuqin Cao and Ziheng Jia and Qiang Hu and Xiongkuo Min and Guangtao Zhai and Hao Fang and Runmin Cong and Xiankai Lu and Xiaofei Zhou and Wei Zhang and Chunyu Zhao and Wentao Mu and Tao Deng and Hamed R. Tavakoli},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
year={2024}
}