-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running inference over a large batch of audio files #22
Labels
Comments
ggerganov
added
enhancement
New feature or request
good first issue
Good for newcomers
labels
Oct 5, 2022
ggerganov
added a commit
that referenced
this issue
Oct 5, 2022
Just added a support to provide multiple input files:
All specified files will be processed with a single model load. |
ggerganov
added a commit
that referenced
this issue
Oct 7, 2022
ggerganov
added a commit
that referenced
this issue
Nov 7, 2022
Can be used to partially process a recording
We now have enough options to be able to process batches of files and also splitting a long file into multiple jobs |
anandijain
pushed a commit
to anandijain/whisper.cpp
that referenced
this issue
Apr 28, 2023
anandijain
pushed a commit
to anandijain/whisper.cpp
that referenced
this issue
Apr 28, 2023
Allows to start processing the input audio at some offset from the beginning. Useful for splitting a long job into multiple tasks.
anandijain
pushed a commit
to anandijain/whisper.cpp
that referenced
this issue
Apr 28, 2023
Can be used to partially process a recording
jacobwu-b
pushed a commit
to jacobwu-b/Transcriptify-by-whisper.cpp
that referenced
this issue
Oct 24, 2023
Can be used to partially process a recording
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi! Firstly, thank you so much for this incredible work!
I have been running the tiny.en models on a large number of wav files stored in a folder. I am currently parallelizing the work over a multi-core machine using GNU parallel and running the following command :
find input_data/eng_wav_data -name "*.wav" | parallel 'time ./main -m models/ggml-tiny.en.bin -nt -f {} -t 1 > {.}.txt'
I found that currently the model is loaded each time we have to transcribe a wav file. Is there a way I can circumvent this and load the model only once? Any help would be appreciated. Thank you. Apologies if this issue has been resolved already
The text was updated successfully, but these errors were encountered: