Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CoVoST2 #1935

Merged
merged 14 commits into from
Feb 24, 2021
Merged

add CoVoST2 #1935

merged 14 commits into from
Feb 24, 2021

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Feb 23, 2021

This PR adds the CoVoST2 dataset for speech translation and ASR.
https://github.com/facebookresearch/covost#covost-2

The dataset requires manual download as the download page requests an email address and the URLs are temporary.

The dummy data is a bit bigger because of the mp3 files and 36 configs.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Think however that the dummy data is too heavy (>10MB in total I think). I guess for now we should just not include the audio files in the dummy data cc @lhoestq since it doesn't hurt the testing anyways

@patil-suraj
Copy link
Contributor Author

@patrickvonplaten
I removed the mp3 files, dummy_data is much smaller now!

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks ! This is really cool

Thanks for making the dummy data lightweight as well :)

datasets/covost2/README.md Show resolved Hide resolved
datasets/covost2/README.md Outdated Show resolved Hide resolved
datasets/covost2/covost2.py Show resolved Hide resolved
datasets/covost2/covost2.py Outdated Show resolved Hide resolved
datasets/covost2/covost2.py Outdated Show resolved Hide resolved
patil-suraj and others added 2 commits February 24, 2021 22:48
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! :D

@lhoestq lhoestq merged commit 96578ad into huggingface:master Feb 24, 2021
@patil-suraj patil-suraj deleted the covost2 branch February 24, 2021 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants