Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Audio] Allow resampling for audio datasets in streaming mode #3369

Closed
patrickvonplaten opened this issue Dec 2, 2021 · 2 comments · Fixed by #3439
Closed

[Audio] Allow resampling for audio datasets in streaming mode #3369

patrickvonplaten opened this issue Dec 2, 2021 · 2 comments · Fixed by #3439
Assignees
Labels
enhancement New feature or request

Comments

@patrickvonplaten
Copy link
Contributor

Many audio datasets like Common Voice always need to be resampled. This can very easily be done in non-streaming mode as follows:

from datasets import load_dataset

ds = load_dataset("common_voice", "ab", split="test")

ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

However in streaming mode it fails currently:

from datasets import load_dataset

ds = load_dataset("common_voice", "ab", split="test", streaming=True)

ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

with the following error:

AttributeError: 'IterableDataset' object has no attribute 'cast_column'  

It would be great if we could add such a feature (I'm not 100% sure though how complex this would be)

@lhoestq
Copy link
Member

lhoestq commented Dec 14, 2021

This requires implementing cast_column for iterable datasets, it could be a very nice addition !

It can also be useful to be able to disable the audio/image decoding for the dataset viewer (see PR #3430) cc @severo
EDIT: actually following #3145 the dataset viewer might not need it anymore

@patrickvonplaten
Copy link
Contributor Author

Just to clarify a bit. This feature is always needed when using the common voice dataset in streaming mode. So I think it's quite important

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants