-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hallucinations during silence #2629
Conversation
When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment. This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py
We need this so bad. Hopefully it'll work with the swift package? |
@itsthisjustin Yes, of course. The fix is done in the core |
gonna test this..here is 1.7.2 [00:01:07.360 --> 00:01:07.820] Father, for all-in-sacrifice, for all-in-sacrifice, for all-in-sacrifice, for all-in-sacrifice, output_srt: saving output to '0155.srt' now let's see with the patch ...downloaded the new whisper.cpp in src 2000 ( 3m chapters ) = 6,000 minutes or 100 hours Duration of audiobook 294660 seconds Total number of chapters: 187 Average length of chapters 1625 chunks for ~182 secs or 00h:03m:02s splits |
@mrfragger And here is the output with this fixed branch. Command line : Please note that the extra hallucinations are removed in this branch. |
@mrfragger |
It's a really bad audio recording of a conversation...that portion. Anyway yeah I most of the time I will eliminate all silence before compiling the audiobook to transcribe. Also if there are music intros and outros trim those if feasible. I believe your patch is addressing the silence so if that does indeed work for that it would be a huge boon. So far I'm been running your patch for the last 6 or 7 hours and no negative effects or anything unusual. |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
# By Georgi Gerganov (4) and others # Via GitHub * ggerganov/master: stream : improve consistency in README (ggerganov#2642) whisper : support no_speech_thold (ggerganov#2625) whisper : add single-timestamp logic (ggerganov#2629) readme : fix typo (ggerganov#2637) cmake : fix "amd64" processor string (ggerganov#2638) vulkan : fix soft_max.comp division by zero (ggerganov#2633) common : add cstdio header stream : update build instructions android : fix build and ci (ggerganov#2624) models : fix typo in download-ggml-model.sh (ggerganov#2623) ruby : Sync whisper.cpp and model download feature (ggerganov#2617) scripts : update to new build system # Conflicts: # src/whisper.cpp
When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment.
This behaviour is on par with openai's whisper. Refer to logic related to
single_timestamp_ending
in https://github.com/openai/whisper/blob/main/whisper/transcribe.py