This is the code for AudioVisual crowd counting. To use the code you need to install PyTorch-1.0 and Python 3.7.
We propose a new dataset for crowd counting, which is composed of around 2000 annotaed images token in different locations in China and each image corresponds to a 1 second audio clip and a density map. The images are in different illuminations. More details can be found in our paper Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions and you can download the dataset here. We also provide the original dot annotations here, please feel free to use it.
- Download the dataset including images, audios and density maps. Unzip the files and put them into the same folder, for example, ./audio_visual_data and then switch DATA_PATH in datasets/AC/setting.py to audio_visual_data.
- Download the pretrained VGGish and put it into ./models/SCC_Model/ folder.
- To train a model using raw images, setting IS_NOISE to False and BLACK_AREA_RATIO to 0.
- To train a model using low-quality images (low illumination and noisy), setting IS_NOISE to True, BLACK_AREA_RATIO to 0 and BRIGHTNESS to [0,1]. The parameter IS_RANDOM indicates whether BRIGHTNESS is a fixed value or a random number during traning. Details can be found in our paper.
- You can also change the settings in config.py, such as the name of the model.
After training, you can run my_test.py to test the trained model. Note that in my_tester.py we also save the predicted density map, you should switch the path self.save_path to your own setting.
The repository is derived from C-3-Framework.
@article{hu2020,
title={Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions},
author={Di Hu and Lichao Mou and Qingzhong Wang and Junyu Gao and Yuansheng Hua and Dejing Dou and Xiao Xiang Zhu},
journal={arXiv preprint},
year={2020}
}