-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs for audio processing #3222
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks :)
also pinging @anton-l @patrickvonplaten @albertvillanova
Nice ! love it this way. I guess you can set this PR to "ready for review" ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great document - thanks a lot for putting all of this together!
I left some tips on how the preprocessing transformers
code example could be a bit simplified.
In short 99% of the use cases when Audio datasets
is used for transformers
is either:
a) A pretrained speech model is fine-tuned
or:
b) A fine-tuned speech model is evaluated / used in inference
For both a) and b) the feature_extractor
is always defined. So we should always advocate to use AutoFeatureExtractor.from_pretrained(...)
here IMO.
For a) the tokenizer
is not defined and has to be created as described in the docs currently. For b) the tokenizer
is also defined so that one can directly use Wav2Vec2Processor.from_pretrained(...)
Hope that helps a bit :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good to me now :)
Let us know if you have more comments or if it's ready to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great reference the transformers
examples!
I guess we can merge this one now :) |
This PR adds documentation for the
Audio
feature. It describes:path
andaudio
, as well as use-cases/best practices for each of them.cast_column
, and then callingds[0]["audio"]
to automatically decode and resample to the desired sampling rate.map
.Preview here, let me know if I'm missing anything!