-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CoVoST2 #1935
add CoVoST2 #1935
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Think however that the dummy data is too heavy (>10MB in total I think). I guess for now we should just not include the audio files in the dummy data cc @lhoestq since it doesn't hurt the testing anyways
@patrickvonplaten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks ! This is really cool
Thanks for making the dummy data lightweight as well :)
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! :D
This PR adds the CoVoST2 dataset for speech translation and ASR.
https://github.com/facebookresearch/covost#covost-2
The dataset requires manual download as the download page requests an email address and the URLs are temporary.
The dummy data is a bit bigger because of the mp3 files and 36 configs.