Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segments Repeating in a Loop when Using 'prompt_tokens'. #1017

Open
IgorKolo opened this issue Jun 15, 2023 · 2 comments
Open

Segments Repeating in a Loop when Using 'prompt_tokens'. #1017

IgorKolo opened this issue Jun 15, 2023 · 2 comments

Comments

@IgorKolo
Copy link

I know the repeating segment 'hallucination' was reported multiple times in the past - I've read through most of it and didn't find a solution.
 
The issue: whisper is outputting the same segment text over and over, independently of actual audio input.
This began happening after I started using prompt_tokens from a previous segment.

So we'll start with this one this one's a little bit stronger than that one
out of North Miami Beach in Miami shores heads up Miami Beach surfside the rain's
start any second now you can probably already feel a few rain drops out there hall over
sunny isle's beach could get some strong wind gusts up to 50 miles per hour as this is coming
through lots of intense cloud the ground lightning strikes with that band up here in
the east where we that flood advisory was talking about this goes until five o'clock for sunrise and plantait.
a little bit of rain in the south.
Now coming up I'm talking about this flood watch extended into the weekend let's go over
to the south.
So I'm just going to go back to the next one.
I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.
So I'm going to go back to the next one.

My use case: real-time segment by segment transcription.
Input to whisper_full are always 3 sec audio buffers (anything shorter results in poor accuracy). Each buffer also begins with 0.2 sec from the end of the previous one.
I have tried models from tiny to medium with similar results.
Parameters are set like this:

whisperParams_ = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
 
const int max_threads = min(4, static_cast<int>(std::thread::hardware_concurrency()));
 
whisperParams_.print_realtime   = true;   
whisperParams_.print_progress   = false;
whisperParams_.print_timestamps = true;
whisperParams_.print_special    = false;
whisperParams_.translate        = false;
whisperParams_.language         = "en";
whisperParams_.n_threads        = max_threads;
whisperParams_.offset_ms        = 0;
whisperParams_.no_context       = false;
whisperParams_.single_segment   = true; 
  
// recommended setting to solve the repeated sentence 'hallucination'
// from https://github.com/ggerganov/whisper.cpp/issues/896
whisperParams_.temperature_inc = 0.1f;
whisperParams_.beam_search.beam_size = 5;
whisperParams_.entropy_thold = 2.8f;
whisperParams_.n_max_text_ctx = 64;
 
//  whisperParams_.n_max_text_ctx = 0;  // this will solve the repeating segments issue; but accuracy is not good.

Plus prompt_tokens and prompt_n_tokens extracted from a previous segment.
 
Sometimes it would run without any 'hallucinations' - and the output quality and speed are quite acceptable.
When it goes into repletion - the output, of course, is useless.
 
Is there a solution for this, please?

@emcodem
Copy link

emcodem commented Aug 2, 2023

Doesnt your question already deliver the answer?
E.g. using const-me whisper, it worked for me to just detect text repetition and clear the prompt in that case:
Const-me/Whisper#26 (comment)
This way, we still have repeated output but not forever.
What you have under "recommended settin to solve the repated sentence halluncination" only tries to solve the original problem but it does not salvage cases where the problem still happens. My solution is the other way around, it don't care about why the problem happened or salvage it's root cause but it recovers in any case. BUT it only recovers "repeated" output, not cases where we get "count up forever" or "lowercase forever" or "no puncutation forever". All of theses are prompt related and relatively easy to detect and salvage by just clearing the prompt.

@bocongl-looki
Copy link

same problem here, any update on this thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants