Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running the stream example - [Start speaking] frozen #747

Closed
catdumitru opened this issue Apr 12, 2023 · 5 comments
Closed

Problems running the stream example - [Start speaking] frozen #747

catdumitru opened this issue Apr 12, 2023 · 5 comments

Comments

@catdumitru
Copy link

I'm having problems running the stream example on a Mac. There is no transcript displayed in the console, instead the output is frozen in the "[Start speaking]" state:

Below is the output for "make stream":
sysctl: unknown oid 'hw.optional.arm64'
I whisper.cpp build info:
I UNAME_S: Darwin
I UNAME_P: i386
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)

make: `stream' is up to date.


./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000 -c 0
init: found 2 capture devices:
init: - Capture device #0: 'Built-in Microphone'
init: - Capture device #1: 'Microsoft Teams Audio'
init: attempt to open capture device 0 : 'Built-in Microphone' ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
whisper_init_from_file_no_state: loading model from './models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem required = 218.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.60 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB

main: processing 8000 samples (step = 0.5 sec / len = 5.0 sec / keep = 0.2 sec), 8 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 9, no_context = 1

[Start speaking]

@ggerganov
Copy link
Owner

Try to increase the --step. For example, to 2000 ms:

make clean && make stream
./stream -m ./models/ggml-base.en.bin -t 8 --step 2000 --length 10000 -c 0

@catdumitru
Copy link
Author

I'm getting the same result unfortunately even if I increase the step size to 2000, 4000 or 8000

@kinory24
Copy link

@catdumitru you fixed that? i'm having the same issue right now

@khromalabs
Copy link

khromalabs commented Mar 10, 2024

Hi same issue, in a Linux environment. I already verified that the speech recognition via main works all right. stream freezes in my computer and even pressing C+c it won't shut down. I tried the parameter to dump the captured audio and I just get a blank wav file of around 900K, so I suspect something is going on related with the audio initialization maybe something related with the sdl2 library? BTW, the sdl2 version installed in my system is 2.30, other sdl2 dependent tools like ffmpeg work all right. I'll keep digging this.

@arosov
Copy link

arosov commented May 6, 2024

Had the same issue, I opened libsdl-org/SDL#9706.
It seems this comes from sdl2 >= 2.30.0.
In the meantime, consider downgrading sdl2 to 2.28.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants