Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for the addition of TIMIT dataset #1903

Merged
merged 13 commits into from
Mar 1, 2021

Conversation

vrindaprabhu
Copy link
Contributor

Below points needs to be addressed:

  • Creation of dummy dataset is failing
  • Need to check on the data representation
  • License is not creative commons. Copyright: Portions © 1993 Trustees of the University of Pennsylvania

Also the links (except the download) point to the ami corpus! ;-)

@patrickvonplaten Requesting your comments, will be happy to address them!

@vrindaprabhu vrindaprabhu mentioned this pull request Feb 18, 2021
@vrindaprabhu
Copy link
Contributor Author

@patrickvonplaten could you please review and help me close this PR?

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for adding this one !

This is pretty good overall, good job :)
I left a few comments

datasets/timit_asr/README.md Outdated Show resolved Hide resolved
datasets/timit_asr/dummy/clean/2.0.1/.~lock.test_data.csv# Outdated Show resolved Hide resolved
datasets/timit_asr/README.md Outdated Show resolved Hide resolved
datasets/timit_asr/timit_asr.py Outdated Show resolved Hide resolved
datasets/timit_asr/timit_asr.py Outdated Show resolved Hide resolved
datasets/timit_asr/README.md Outdated Show resolved Hide resolved
@vrindaprabhu
Copy link
Contributor Author

@lhoestq Thank you so much for your comments and for patiently reviewing the code. Have hopefully included all the suggested changes. Let me know if any more changes are required.

Sorry the code had lots of silly errors from my side!:' Will be more careful from next time! :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes ! :)

added my last comments

Also it looks like the dummy data file is quite big (5MB), could you try to reduce its size a bit ?
To do so feel free to only keep 1 or 2 lines in the csv files (with is audio to TRUE and is_converted_audio to FALSE) and only keep the relevant wav/txt/phn files.

The idea is to have a very small set of data in order to test the script quickly and keeping the repository lightweight :)

datasets/timit_asr/README.md Show resolved Hide resolved
datasets/timit_asr/README.md Outdated Show resolved Hide resolved
datasets/timit_asr/README.md Outdated Show resolved Hide resolved
datasets/timit_asr/dataset_infos.json Outdated Show resolved Hide resolved
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :-)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes ! :)

datasets/timit_asr/README.md Outdated Show resolved Hide resolved
@lhoestq lhoestq merged commit 12f3e00 into huggingface:master Mar 1, 2021
@patrickvonplaten patrickvonplaten linked an issue Mar 15, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add TIMIT
3 participants