You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to get a live streaming.exe transcription (and/or command.exe) to work as accurately as main.exe when processing the same audio input. Ideally I would also like the input simultaneously transcribed and saved as a .wav file for future reprocessing although at this point I'm not attempting both simultaneously since even basic streaming transcription is not working.
If I record audio using a c++ SDL2 program to take input from the PC mic and save it as a wav file 16k, AUDIO_FORMAT = AUDIO_S16LSB then load it into whisper main.exe to transcribe it, then main.exe will transcribe slightly faster than real-time with reasonable accuracy (implying time isn't the limiting factor).
Playing the same audio through the same microphone (or with normal voice) the transcription quality is significantly worse when using streaming.exe or command.exe, and even on the highspec machine there are chunks of audio which are totally ignored.
I've tried this on multiple Windows 10 PCs - including top end desktops (12 core + 64MB + GPU) and relative basic i5s with only 8MB and no GPU with the same difference. Tested both ad-hoc voice as well as playing a track from a speaker to the microphone so both inputs are identical. I've also had the same issue for every whisper version I've tried over the past year.
I've tried setting -keep-context = true
I've tried changing the following common-sdl.cpp settings with no success
changing format:-
AUDIO_F32; -> AUDIO_S16LSB
changing buffer size:-
capture_spec_requested.samples = 1024;
-> 16384;
boosting SDL thread priority:
SDL_SetHintWithPriority(SDL_HINT_AUDIO_RESAMPLING_MODE, "medium",
-> SDL_HINT_OVERRIDE);SDL_SetThreadPriority(SDL_THREAD_PRIORITY_HIGH);
setting c++ thread priority:
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_ABOVE_NORMAL);
The text was updated successfully, but these errors were encountered:
I'm trying to get a live streaming.exe transcription (and/or command.exe) to work as accurately as main.exe when processing the same audio input. Ideally I would also like the input simultaneously transcribed and saved as a .wav file for future reprocessing although at this point I'm not attempting both simultaneously since even basic streaming transcription is not working.
If I record audio using a c++ SDL2 program to take input from the PC mic and save it as a wav file 16k, AUDIO_FORMAT = AUDIO_S16LSB then load it into whisper main.exe to transcribe it, then main.exe will transcribe slightly faster than real-time with reasonable accuracy (implying time isn't the limiting factor).
Playing the same audio through the same microphone (or with normal voice) the transcription quality is significantly worse when using streaming.exe or command.exe, and even on the highspec machine there are chunks of audio which are totally ignored.
I've tried this on multiple Windows 10 PCs - including top end desktops (12 core + 64MB + GPU) and relative basic i5s with only 8MB and no GPU with the same difference. Tested both ad-hoc voice as well as playing a track from a speaker to the microphone so both inputs are identical. I've also had the same issue for every whisper version I've tried over the past year.
I've tried setting -keep-context = true
I've tried changing the following common-sdl.cpp settings with no success
changing format:-
AUDIO_F32; -> AUDIO_S16LSB
changing buffer size:-
capture_spec_requested.samples = 1024;
-> 16384;
boosting SDL thread priority:
SDL_SetHintWithPriority(SDL_HINT_AUDIO_RESAMPLING_MODE, "medium",
-> SDL_HINT_OVERRIDE);SDL_SetThreadPriority(SDL_THREAD_PRIORITY_HIGH);
setting c++ thread priority:
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_ABOVE_NORMAL);
The text was updated successfully, but these errors were encountered: