You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know the repeating segment 'hallucination' was reported multiple times in the past - I've read through most of it and didn't find a solution.
The issue: whisper is outputting the same segment text over and over, independently of actual audio input.
This began happening after I started using prompt_tokens from a previous segment.
So we'll start with this one this one's a little bit stronger than that one
out of North Miami Beach in Miami shores heads up Miami Beach surfside the rain's
start any second now you can probably already feel a few rain drops out there hall over
sunny isle's beach could get some strong wind gusts up to 50 miles per hour as this is coming
through lots of intense cloud the ground lightning strikes with that band up here in
the east where we that flood advisory was talking about this goes until five o'clock for sunrise and plantait.
a little bit of rain in the south.
Now coming up I'm talking about this flood watch extended into the weekend let's go over
to the south.
So I'm just going to go back to the next one.
I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
My use case: real-time segment by segment transcription.
Input to whisper_full are always 3 sec audio buffers (anything shorter results in poor accuracy). Each buffer also begins with 0.2 sec from the end of the previous one.
I have tried models from tiny to medium with similar results.
Parameters are set like this:
whisperParams_ = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
const int max_threads = min(4, static_cast<int>(std::thread::hardware_concurrency()));
whisperParams_.print_realtime = true;
whisperParams_.print_progress = false;
whisperParams_.print_timestamps = true;
whisperParams_.print_special = false;
whisperParams_.translate = false;
whisperParams_.language = "en";
whisperParams_.n_threads = max_threads;
whisperParams_.offset_ms = 0;
whisperParams_.no_context = false;
whisperParams_.single_segment = true;
// recommended setting to solve the repeated sentence 'hallucination'
// from https://github.com/ggerganov/whisper.cpp/issues/896
whisperParams_.temperature_inc = 0.1f;
whisperParams_.beam_search.beam_size = 5;
whisperParams_.entropy_thold = 2.8f;
whisperParams_.n_max_text_ctx = 64;
// whisperParams_.n_max_text_ctx = 0; // this will solve the repeating segments issue; but accuracy is not good.
Plus prompt_tokens and prompt_n_tokens extracted from a previous segment.
Sometimes it would run without any 'hallucinations' - and the output quality and speed are quite acceptable.
When it goes into repletion - the output, of course, is useless.
Is there a solution for this, please?
The text was updated successfully, but these errors were encountered:
Doesnt your question already deliver the answer?
E.g. using const-me whisper, it worked for me to just detect text repetition and clear the prompt in that case: Const-me/Whisper#26 (comment)
This way, we still have repeated output but not forever.
What you have under "recommended settin to solve the repated sentence halluncination" only tries to solve the original problem but it does not salvage cases where the problem still happens. My solution is the other way around, it don't care about why the problem happened or salvage it's root cause but it recovers in any case. BUT it only recovers "repeated" output, not cases where we get "count up forever" or "lowercase forever" or "no puncutation forever". All of theses are prompt related and relatively easy to detect and salvage by just clearing the prompt.
I know the repeating segment 'hallucination' was reported multiple times in the past - I've read through most of it and didn't find a solution.
The issue: whisper is outputting the same segment text over and over, independently of actual audio input.
This began happening after I started using
prompt_tokens
from a previous segment.So we'll start with this one this one's a little bit stronger than that one
out of North Miami Beach in Miami shores heads up Miami Beach surfside the rain's
start any second now you can probably already feel a few rain drops out there hall over
sunny isle's beach could get some strong wind gusts up to 50 miles per hour as this is coming
through lots of intense cloud the ground lightning strikes with that band up here in
the east where we that flood advisory was talking about this goes until five o'clock for sunrise and plantait.
a little bit of rain in the south.
Now coming up I'm talking about this flood watch extended into the weekend let's go over
to the south.
So I'm just going to go back to the next one.
I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
My use case: real-time segment by segment transcription.
Input to
whisper_full
are always 3 sec audio buffers (anything shorter results in poor accuracy). Each buffer also begins with 0.2 sec from the end of the previous one.I have tried models from tiny to medium with similar results.
Parameters are set like this:
Plus
prompt_tokens
andprompt_n_tokens
extracted from a previous segment.Sometimes it would run without any 'hallucinations' - and the output quality and speed are quite acceptable.
When it goes into repletion - the output, of course, is useless.
Is there a solution for this, please?
The text was updated successfully, but these errors were encountered: