Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task template for automatic speech recognition #2533

Merged
merged 6 commits into from
Jun 23, 2021

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Jun 22, 2021

This PR adds a task template for automatic speech recognition. In this task, the input is a path to an audio file which the model consumes to produce a transcription.

Usage:

from datasets import load_dataset
from datasets.tasks import AutomaticSpeechRecognition

ds = load_dataset("timit_asr", split="train[:10]")
# Dataset({
#     features: ['file', 'text', 'phonetic_detail', 'word_detail', 'dialect_region', 'sentence_type', 'speaker_id', 'id'],
#     num_rows: 10
# })

task = AutomaticSpeechRecognition(audio_file_column="file", transcription_column="text")
ds.prepare_for_task(task)
# Dataset({
#     features: ['audio_file', 'transcription'],
#     num_rows: 10
# })

@lewtun lewtun requested review from SBrandeis and lhoestq June 22, 2021 12:56
@dataclass(frozen=True)
class AutomaticSpeechRecognition(TaskTemplate):
task: str = "automatic-speech-recognition"
input_schema: ClassVar[Features] = Features({"audio_file": Value("string")})
Copy link
Member

@lhoestq lhoestq Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this template :)

Note that in the future we'll have an Audio feature type that will probably have additional parameters (like ClassLabel) such as the sampling rate or the audio format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know!

@@ -2144,6 +2144,39 @@ def test_task_question_answering(self, in_memory):
)
self.assertDictEqual(features_after_cast, dset.features)

def test_task_automatic_speech_recognition(self, in_memory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't hesitate to move it outside of the BaseDatasetTest class, as mentioned in #2529

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do this here as well!

Copy link
Contributor

@SBrandeis SBrandeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lewtun !

@dataclass(frozen=True)
class AutomaticSpeechRecognition(TaskTemplate):
task: str = "automatic-speech-recognition"
input_schema: ClassVar[Features] = Features({"audio_file": Value("string")})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe audio_file_path would be more explicit on what the column represent ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, paths are not portable between machines a priori. This is probably good enough for now, but at some point, we'll need to replace the Value("string") with an Audio or Signal feature!

@lewtun
Copy link
Member Author

lewtun commented Jun 23, 2021

@SBrandeis @lhoestq i've integrated your suggestions, so this is ready for another review :)

@lhoestq
Copy link
Member

lhoestq commented Jun 23, 2021

Merging if it's good for you @lewtun :)

@lhoestq lhoestq merged commit 0764fcd into huggingface:master Jun 23, 2021
@lewtun lewtun deleted the add-asr-template branch June 23, 2021 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants