Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Official code for "Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation"

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation,
Yuanhong Chen*, Yuyuan Liu*, Hu Wang, Fengbei Liu, Chong Wang, Helen Frazer, Gustavo Carneiro.
CVPR 2024 (arXiv 2304.02970)

Dataset

VPO datasets are available here

VGGSound audio files are available here

Visual comparison between datasets. We show four audio-visual classes, including “female”, “cat”, “dog”, and “car”. The AVSBench (SS) (1st frame) provides pixel-level multi-class annotations to the images containing a single sounding object. The proposed VPO benchmarks (2nd frame to 4th frame) pair a subset of the segmented objects in an image with relevant audio files to produce pixel-level multi-class annotations.

Results

Please note that all the tables in the original paper use conventional semantic segmentaiton (per-dataset) mIoU and F-score metrics. We update the following table based on per-image mIoU and per-video F-score based on TPAVI. Please note that the current repository version uses AVSBench-Semantics to facilitate training and evaluation on the AVSBench-Objects dataset. However, the label noise in AVSBench-Semantics may affect the final results on the AVSBench-Objects dataset. Therefore, it is recommended to use the original AVSBench-Objects dataset instead.

Instance-level Evaluation (AVSBench Metrics)

RESNET-50 (IMGNET PRETRAIN)
	AVSBench-Object (SS)			AVSBench-Object (MS)			AVSBench-Semantics
Model	J&F Mean	J Mean	F Mean	J&F Mean	J Mean	F Mean	J&F Mean	J Mean	F Mean
CATR	80.70	74.80	86.60	59.05	52.80	65.30	-	-	-
AuTR	80.10	75.00	85.20	55.30	49.40	61.20	-	-	-
AVSegFormer	80.67	76.54	84.80	56.17	49.53	62.80	27.12	24.93	29.30
AVSC	81.13	77.02	85.24	55.55	49.58	61.51	-	-	-
BAVS	81.63	77.96	85.29	56.30	50.23	62.37	27.16	24.68	29.63
TPAVI	78.80	72.79	84.80	52.84	47.88	57.80	22.69	20.18	25.20
AVSBG	79.77	74.13	85.40	50.88	44.95	56.80	-	-	-
ECMVAE	81.42	76.33	86.50	54.70	48.69	60.70	-	-	-
DiffusionAVS	81.35	75.80	86.90	55.94	49.77	62.10	-	-	-
CAVP	83.84	78.78	88.89	61.48	55.82	67.14	32.83	30.37	35.29

RESNET-50 (COCO PRETRAIN)
	AVSBench-Object (SS)			AVSBench-Object (MS)
Model	J&F Mean	J Mean	F Mean	J&F Mean	J Mean	F Mean
AQFormer	81.70	77.00	86.40	61.30	55.70	66.90
CAVP	83.75	78.72	88.77	62.34	56.42	68.25

Dataset-level Evaluation (Convention Semantic Segmentation Metrics)

RESNET-50 (IMAGENET PRETRAIN)
	AVSBench-Object (SS)		AVSBench-Object (MS)		AVSBench-Semantics
Model	mIoU	F-Score	mIoU	F-Score	mIoU	F-Score
CAVP	89.43	94.50	72.79	83.05	44.70	57.76

Demon

eK3T8tH3Y9E_134000_144000.mp4

fGuTL7evhLs_67000_77000.mp4

fiFafHWqC4A_6000_16000.mp4

Checkpoints

Checkpoints are available here: avsbench-object-ss-224, avsbench-object-ms-224, avss-224.

Usage

Requirements

git clone git@github.com:cyh-0/CAVP.git
cd CAVP
pip install -r requirements.txt

Path

ln -s /path/to/datasets ../audio_visual
ln -s /path/to/ckpts ./ckpts

Training

Before training, you need to update your own WANDB_KEY in the config file.

Training scripts for AVSBench-Semantic.

python main_avss.py --experiment_name "CAVP" --setup avss --gpus 1 --batch_size 16 --lr 1e-3 --weight_decay 1e-4 --epochs 80 --wandb_mode disabled --num_workers 16

Training scripts for VPO-MONO.

python main_vpo_mono.py --experiment_name "CAVP" --setup "vpo_ss" --gpus 2 --batch_size 8 --lr 5e-4 --weight_decay 5e-4 --epochs 80 --num_workers 16 --wandb_mode online

python main_vpo_mono.py --experiment_name "CAVP" --setup "vpo_ms" --gpus 2 --batch_size 8 --lr 5e-4 --weight_decay 5e-4 --epochs 80 --num_workers 16 --wandb_mode online

python main_vpo_mono.py --experiment_name "CAVP" --setup "vpo_msmi" --gpus 2 --batch_size 8 --lr 5e-4 --weight_decay 5e-4 --epochs 80 --num_workers 16 --wandb_mode online

Training scripts for VPO-STEREO.

python main_vpo_stereo.py --experiment_name "CAVP" --setup "vpo_ss" --gpus 2 --batch_size 8 --lr 5e-4 --weight_decay 5e-4 --epochs 80 --num_workers 16 --wandb_mode online

python main_vpo_stereo.py --experiment_name "CAVP" --setup "vpo_ms" --gpus 2 --batch_size 8 --lr 5e-4 --weight_decay 5e-4 --epochs 80 --num_workers 16 --wandb_mode online

python main_vpo_stereo.py --experiment_name "CAVP" --setup "vpo_msmi" --gpus 2 --batch_size 8 --lr 5e-4 --weight_decay 5e-4 --epochs 80 --num_workers 16 --wandb_mode online

Citation

@misc{chen2024unraveling,
      title={Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation}, 
      author={Yuanhong Chen and Yuyuan Liu and Hu Wang and Fengbei Liu and Chong Wang and Helen Frazer and Gustavo Carneiro},
      year={2024},
      eprint={2304.02970},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
config		config
dataset		dataset
engine		engine
figs		figs
loss		loss
models		models
torchvggish		torchvggish
trainer		trainer
utils		utils
.gitignore		.gitignore
README.md		README.md
main_avss.py		main_avss.py
main_avss_resize.py		main_avss_resize.py
main_vpo_mono.py		main_vpo_mono.py
main_vpo_stereo.py		main_vpo_stereo.py
requirements.txt		requirements.txt
run.sh		run.sh
test_avss_resize.py		test_avss_resize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Dataset

Results

Instance-level Evaluation (AVSBench Metrics)

Dataset-level Evaluation (Convention Semantic Segmentation Metrics)

Demon

Checkpoints

Usage

Requirements

Path

Training

Citation

About

Releases

Packages

Contributors 2

Languages

cyh-0/CAVP

Folders and files

Latest commit

History

Repository files navigation

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Dataset

Results

Instance-level Evaluation (AVSBench Metrics)

Dataset-level Evaluation (Convention Semantic Segmentation Metrics)

Demon

Checkpoints

Usage

Requirements

Path

Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages