Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming.exe + command.exe transcription much lower quality than main.exe with otherwise identical setup #1641

Closed
dgm3333 opened this issue Dec 14, 2023 · 1 comment

Comments

@dgm3333
Copy link

dgm3333 commented Dec 14, 2023

I'm trying to get a live streaming.exe transcription (and/or command.exe) to work as accurately as main.exe when processing the same audio input. Ideally I would also like the input simultaneously transcribed and saved as a .wav file for future reprocessing although at this point I'm not attempting both simultaneously since even basic streaming transcription is not working.

If I record audio using a c++ SDL2 program to take input from the PC mic and save it as a wav file 16k, AUDIO_FORMAT = AUDIO_S16LSB then load it into whisper main.exe to transcribe it, then main.exe will transcribe slightly faster than real-time with reasonable accuracy (implying time isn't the limiting factor).
Playing the same audio through the same microphone (or with normal voice) the transcription quality is significantly worse when using streaming.exe or command.exe, and even on the highspec machine there are chunks of audio which are totally ignored.

I've tried this on multiple Windows 10 PCs - including top end desktops (12 core + 64MB + GPU) and relative basic i5s with only 8MB and no GPU with the same difference. Tested both ad-hoc voice as well as playing a track from a speaker to the microphone so both inputs are identical. I've also had the same issue for every whisper version I've tried over the past year.

I've tried setting -keep-context = true

I've tried changing the following common-sdl.cpp settings with no success
changing format:-
AUDIO_F32; -> AUDIO_S16LSB
changing buffer size:-
capture_spec_requested.samples = 1024;
-> 16384;
boosting SDL thread priority:
SDL_SetHintWithPriority(SDL_HINT_AUDIO_RESAMPLING_MODE, "medium",
-> SDL_HINT_OVERRIDE);SDL_SetThreadPriority(SDL_THREAD_PRIORITY_HIGH);
setting c++ thread priority:
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_ABOVE_NORMAL);

@dgm3333
Copy link
Author

dgm3333 commented Dec 19, 2023

I've updated some libraries on the build PC and converted to Win 11 and now working well - so potentially this was an external issue

Works in realtime using:
cd C:\temp && C:\bin\stream.exe -m C:\bin\models\ggml-small.en.bin -c 0 -sa

@dgm3333 dgm3333 closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant