add CoVoST2 #1935

patil-suraj · 2021-02-23T16:28:16Z

This PR adds the CoVoST2 dataset for speech translation and ASR.
https://github.com/facebookresearch/covost#covost-2

The dataset requires manual download as the download page requests an email address and the URLs are temporary.

The dummy data is a bit bigger because of the mp3 files and 36 configs.

…o covost2

datasets/covost2/covost2.py

patrickvonplaten

Looks great to me! Think however that the dummy data is too heavy (>10MB in total I think). I guess for now we should just not include the audio files in the dummy data cc @lhoestq since it doesn't hurt the testing anyways

…o covost2

patil-suraj · 2021-02-24T09:19:52Z

@patrickvonplaten
I removed the mp3 files, dummy_data is much smaller now!

lhoestq

Nice thanks ! This is really cool

Thanks for making the dummy data lightweight as well :)

datasets/covost2/README.md

datasets/covost2/covost2.py

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

lhoestq

Thanks ! :D

patil-suraj added 9 commits February 23, 2021 13:05

add covost2

b45843d

update script

bf3f221

add dummy data

3fb6ea2

add covost2

c39f1fa

update script

7e6e0be

add dummy data

541eec7

Merge branch 'covost2' of https://github.com/patil-suraj/datasets int…

a317073

…o covost2

update script, metadata

6c1e634

add dataset card

2eaa52d

patil-suraj requested review from lhoestq and patrickvonplaten February 23, 2021 16:29

patrickvonplaten reviewed Feb 24, 2021

View reviewed changes

datasets/covost2/covost2.py Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Feb 24, 2021

View reviewed changes

patil-suraj added 3 commits February 24, 2021 09:13

fix formating

8dbd265

slim dummy data

b9b9f9d

Merge branch 'covost2' of https://github.com/patil-suraj/datasets int…

799c5e7

…o covost2

lhoestq reviewed Feb 24, 2021

View reviewed changes

patil-suraj and others added 2 commits February 24, 2021 22:48

Apply suggestions from code review

7706c81

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>

adress review comments

21996e4

patil-suraj requested a review from lhoestq February 24, 2021 17:35

lhoestq approved these changes Feb 24, 2021

View reviewed changes

lhoestq merged commit 96578ad into huggingface:master Feb 24, 2021

patil-suraj deleted the covost2 branch February 24, 2021 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add CoVoST2 #1935

add CoVoST2 #1935

patil-suraj commented Feb 23, 2021 •

edited

Loading

patrickvonplaten left a comment

patil-suraj commented Feb 24, 2021

lhoestq left a comment

lhoestq left a comment

add CoVoST2 #1935

add CoVoST2 #1935

Conversation

patil-suraj commented Feb 23, 2021 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

patil-suraj commented Feb 24, 2021

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

patil-suraj commented Feb 23, 2021 •

edited

Loading