Which value was used for masked values of SpecAugment? #730

jumon · 2022-12-20T03:12:06Z

jumon
Dec 20, 2022

Thank you for the great work and for making it open-source!

I am currently trying to fine-tune the Whisper large-v2 model with SpecAugment and I am wondering which value was used for masked values during training. Was it 0? My concern with using 0 for masking is that it is also used for padding audio, and this could potentially cause the model to "hallucinate" at the end of shorter audio files (less than 30 seconds) during recognition, since the model was trained to predict masked parts that may have overlapped with padding values.

Answered by jongwook

Mar 7, 2023

Apologies for missing this post -- this is a totally valid point and I should fix the zero-padding method in transcribe() (also answered in #838)

View full answer

jongwook · 2023-03-07T02:34:44Z

jongwook
Mar 7, 2023
Maintainer

Apologies for missing this post -- this is a totally valid point and I should fix the zero-padding method in transcribe() (also answered in #838)

1 reply

jumon Mar 7, 2023
Author

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which value was used for masked values of SpecAugment? #730

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Which value was used for masked values of SpecAugment? #730

jumon Dec 20, 2022

Replies: 1 comment · 1 reply

jongwook Mar 7, 2023 Maintainer

jumon Mar 7, 2023 Author

jumon
Dec 20, 2022

Replies: 1 comment 1 reply

jongwook
Mar 7, 2023
Maintainer

jumon Mar 7, 2023
Author