Skip to content

Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence

Notifications You must be signed in to change notification settings

Ego4DSounds/Ego4DSounds

Repository files navigation

Ego4DSounds

Ego4DSounds is a subset of Ego4D, an existing large-scale egocentric video dataset. Videos have a high action-audio correspondence, making it a high-quality dataset for action-to-sound generation.

Explore the dataset

Action2Sound

Dataset introduced in "Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos".

Action2Sound is an ambient-aware approach that disentangles the action sound from the ambient sound, allowing successful generation after training with diverse in-the-wild data, as well as controllable conditioning on ambient sound levels.

action2sound

Explore the project

Contents

This repository contains scripts for processing the Ego4DSounds dataset. It includes functionality for loading video and audio data and extracting clips using metadata.

  • extract_ego4d_clips.py: Extracts clips from the Ego4D dataset
  • dataset.py: Defines the Ego4DSounds dataset class for loading and processing video and audio clips
  • Metadata files: train_clips_1.2m.csv, test_clips_11k.csv, ego4d.json

Each row in the csv files has the following columns

video_uid, video_dur, narration_source, narration_ind, narration_time, clip_start, clip_end, clip_text, tag_verb, tag_noun, positive, clip_file, speech, background_music, traffic_noise, wind_noise

BibTeX

@article{chen2024action2sound,
  title = {Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos},
  author = {Changan Chen and Puyuan Peng and Ami Baid and Sherry Xue and Wei-Ning Hsu and David Harwath and Kristen Grauman},
  year = {2024},
  journal = {arXiv},
}

About

Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages