-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make several audio datasets streamable #3290
Conversation
This reverts commit c973209.
Reading FLAC (for In [2]: ds = load_dataset("datasets/librispeech_asr/librispeech_asr.py", "clean", streaming=True, split="train.100")
In [3]: item = next(iter(ds))
In [4]: item.keys()
Out[4]: dict_keys(['file', 'audio', 'text', 'speaker_id', 'chapter_id', 'id'])
In [5]: item["file"]
Out[5]: '374-180298-0000.flac'
In [6]: item["audio"].keys()
Out[6]: dict_keys(['path', 'array', 'sampling_rate'])
In [7]: item["audio"]["sampling_rate"]
Out[7]: 16000
In [8]: item["audio"]["path"]
Out[8]: '374-180298-0000.flac'
In [9]: item["audio"]["array"].shape
Out[9]: (232480,) |
Oh cool ! I think this might have come from an issue with my local |
I'll do |
@lhoestq @albertvillanova - think it would have been nice to have added a big message at the top stating that this is a breaking change and ping |
"filepath": os.path.join(abs_path_to_data, "train.tsv"), | ||
"path_to_clips": abs_path_to_clips, | ||
"files": dl_manager.iter_archive(archive), | ||
"filepath": "/".join([path_to_data, "train.tsv"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is breaking no?
Needs #3129 to be merged firstMake those audio datasets streamable:
(still has some issues to read FLAC)actually it's okmultilingual_librispeech (yet to be converted)TODO in a separate PR