-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Medium.en model just outputting "Okay" for every second in the audio while the base.en model works well #719
Comments
I having the same problem:
|
I have the same issue ... seems not to be related to a specific model ... and not with each input file ... |
I've disabled the decoder fallbacks because current implementation is very inefficient. |
Turned out that in one case the section where multiple "Okay"s were "hallucinated" was loud rumbling / noises (no speech). I isolated this part and it was detected correctly. After that I took one detected noise output (like "(pages rustling)") as an input for the prompt-parameter and the original file was detected properly. This is of course not working in large scale. |
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close #471 #477 #508 #612 #719 #731
Should be resolved via f19e23f |
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggerganov#471 ggerganov#477 ggerganov#508 ggerganov#612 ggerganov#719 ggerganov#731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggerganov#471 ggerganov#477 ggerganov#508 ggerganov#612 ggerganov#719 ggerganov#731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggerganov#471 ggerganov#477 ggerganov#508 ggerganov#612 ggerganov#719 ggerganov#731
I disabled this because there were many complaints about slow decoding. The current implementation does not allow batching the decoders when using the "best of" or "beam size" parameters, so the decoding time is proportional to the number of decoders, which is obviously not great. However, now there are even more complaints about wrong decodings and repetition. So, making a compromise by re-enabling the fallbacks, but defaulting to just 2 "best of" / "beam size" decoders. Also, the temperature step is increased from 0.2 to 0.4 - i.e. from maximum of 5 fallbacks to maximum of 2. Also, the stream example now has fallbacks enabled by default. close ggerganov#471 ggerganov#477 ggerganov#508 ggerganov#612 ggerganov#719 ggerganov#731
Hello Everyone,
I have a recording that I'm trying to transcribe. I first tried doing that using base model which worked fine but not perfect. I then tried doing the same using the Medium.en model but it just outputs "Okay" for each second of the audio.
Although there are 5 or 6 "Okays" in the audio but Medium model just keeps on outputting "Okay" even for lines which the "Base" model is able to transcribe.
Screenshot of Base.en model's output which works well :
Screenshot of Medium.en model's output :
Any idea on what I might be doing wrong ?
The text was updated successfully, but these errors were encountered: