Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for audio processing #3222

Merged
merged 6 commits into from
Nov 24, 2021
Merged

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Nov 5, 2021

This PR adds documentation for the Audio feature. It describes:

  • The difference between loading path and audio, as well as use-cases/best practices for each of them.
  • Resampling audio files with cast_column, and then calling ds[0]["audio"] to automatically decode and resample to the desired sampling rate.
  • Resampling with map.

Preview here, let me know if I'm missing anything!

@stevhliu stevhliu added the documentation Improvements or additions to documentation label Nov 5, 2021
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks :)
also pinging @anton-l @patrickvonplaten @albertvillanova

docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Show resolved Hide resolved
docs/source/audio_process.rst Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Nov 9, 2021

Nice ! love it this way. I guess you can set this PR to "ready for review" ?

@stevhliu stevhliu marked this pull request as ready for review November 9, 2021 17:01
docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great document - thanks a lot for putting all of this together!

I left some tips on how the preprocessing transformers code example could be a bit simplified.

In short 99% of the use cases when Audio datasets is used for transformers is either:

a) A pretrained speech model is fine-tuned

or:

b) A fine-tuned speech model is evaluated / used in inference

For both a) and b) the feature_extractor is always defined. So we should always advocate to use AutoFeatureExtractor.from_pretrained(...) here IMO.

For a) the tokenizer is not defined and has to be created as described in the docs currently. For b) the tokenizer is also defined so that one can directly use Wav2Vec2Processor.from_pretrained(...)

Hope that helps a bit :-)

docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Outdated Show resolved Hide resolved
docs/source/audio_process.rst Show resolved Hide resolved
docs/source/audio_process.rst Outdated Show resolved Hide resolved
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me now :)
Let us know if you have more comments or if it's ready to merge

Copy link
Member

@anton-l anton-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great reference the transformers examples!

@lhoestq
Copy link
Member

lhoestq commented Nov 24, 2021

I guess we can merge this one now :)

@lhoestq lhoestq merged commit a8f96b3 into huggingface:master Nov 24, 2021
@stevhliu stevhliu deleted the audio-docs branch November 24, 2021 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants