AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

teaser-video.mp4

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

Abstract: We propose AV-Link, a unified framework for Video-to-Audio and Audio-to-Video generation that leverages the activations of frozen video and audio diffusion models for temporally-aligned cross-modal conditioning. The key to our framework is a Fusion Block that enables bidirectional information exchange between our backbone video and audio diffusion models through a temporally-aligned self attention operation. Unlike prior work that uses feature extractors pretrained for other tasks for the conditioning signal, AV-Link can directly leverage features obtained by the complementary modality in a single framework i.e. video features to generate audio, or audio features to generate video. We extensively evaluate our design choices and demonstrate the ability of our method to achieve synchronized and high-quality audiovisual content, showcasing its potential for applications in immersive media generation. For more details, please visit our project webpage or read our paper.

Issues

If you have any questions about AV-Link, please open an issue in this GitHub page or send your questions to mh155@rice.edu

Project Page Template

a template of our project page can be found under docs directory

Citation

If you find this paper useful in your research, please consider citing:

@misc{avlink,
      title={AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation}, 
      author={Moayed Haji-Ali and Willi Menapace and Aliaksandr Siarohin and Ivan Skorokhodov and Alper Canberk and Kwot Sin Lee and Vicente Ordonez and Sergey Tulyakov},
      year={2024},
      eprint={2412.15191},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.15191}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

Issues

Project Page Template

Citation

About

Releases

Packages

snap-research/AVLink

Folders and files

Latest commit

History

Repository files navigation

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

Issues

Project Page Template

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages