Fine-tune Distil-Whisper for personalization #115

Tejaswgupta · 2024-04-07T20:21:35Z

Tejaswgupta
Apr 7, 2024

What would be the most efficient and likely the easiest method to fine-tune distil-whisper(or whisper) on personal database to adapt to a user's phonetics and vocabulary. Additionally if anyone knows, how much min data should we expect to have to successfully fine-tune the model.

\cc @sanchit-gandhi @Vaibhavs10

sanchit-gandhi · 2024-04-08T15:10:30Z

sanchit-gandhi
Apr 8, 2024
Maintainer

Hey @Tejaswgupta, this guide here should answer both your questions: https://github.com/huggingface/distil-whisper/tree/main/training#overview-of-training-methods

You can convert any personal dataset to Hugging Face datasets using this guide: https://huggingface.co/docs/datasets/audio_dataset

Once done, you can run the fine-tuning code by dropping in your custom dataset in HF format.

0 replies

Tejaswgupta · 2024-04-08T19:05:51Z

Tejaswgupta
Apr 8, 2024
Author

Thanks for the quick reply @sanchit-gandhi , my concern with the methods is would they scale well with the number of users.
More specifically my questions are:

How can we use minimal amount of data to train personalized models based on an individuals vocabulary and phonetics. I'm assuming we get ~10h audio with annotated text from a user.
Is training multiple PEFT adapters a reasonable way to go forward for such a use case? Assuming the users increase over time.

Thanks a lot!

1 reply

sanchit-gandhi Apr 9, 2024
Maintainer

There's an underlying question of whether it's better to train one model on all of these user's data, or individual models for each user. The answer to this depends on how similar/different each user is: if they're more similar than they are different, then mixing all the data together and training a single model is likely to be more performant, since you can exploit some knowledge transfer between users.

Otherwise, you can explore:

If you have 10h of data, you should be able to run fine-tuning already. If you're concerned about the amount of data, you can also try regularisation (such as attention dropout)
If you need a single model for each user, you can indeed try PEFT adapters. See this guide from @Vaibhavs10 that explains the steps end-to-end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune Distil-Whisper for personalization #115

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Fine-tune Distil-Whisper for personalization #115

Tejaswgupta Apr 7, 2024

Replies: 2 comments · 1 reply

sanchit-gandhi Apr 8, 2024 Maintainer

Tejaswgupta Apr 8, 2024 Author

sanchit-gandhi Apr 9, 2024 Maintainer

Tejaswgupta
Apr 7, 2024

Replies: 2 comments 1 reply

sanchit-gandhi
Apr 8, 2024
Maintainer

Tejaswgupta
Apr 8, 2024
Author

sanchit-gandhi Apr 9, 2024
Maintainer