-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement max line width and max line count, and make word highlighting optional #1184
Conversation
Suddenly whisper transcribe started to misbehave, probably something related to line length? My transcribed text used to look like this, which is good:
Now instead it looks like this, which is very bad:
What option do I need to add to this minimal code to bring back the old behaviour? I think that the default behaviour should stay identical to the past, since now the output is all messed up. thanks for any help
|
I Agree with @GianniGi would be great to have an option to force a line break at the end of the sentence. |
This pull request is not about sentence segmentation, it is about wrapping subtitles to a maximum line width and line count. If you are interested in the discussion about sentence segmentation, it is over here: #1243 |
…ng optional (openai#1184) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Do we know when this will make it into the API? I can't see it in the docs yet |
…ng optional (openai#1184) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Hello, sorry in advance if this is not the right place to ask, but I'm writing a python script that takes a mp4 file in output and outputs a WEBVTT file of the transcription. I managed to make it work, but now I'm trying to reduce the size of each subtitle lines and get closer to word-level transcriptions in WEBVTT but I'm having trouble understanding how to set the word_timestamps parameter to True when implementing Whisper in a Python script. I understand from this snippet of code (from ilanit1997@819074f): if not args["word_timestamps"]: that you can set it using its command line argument, but I can't find out how to do it in my basic python script. (pasted it down below for reference). import whisper model = whisper.load_model('base.en') whisper.DecodingOptions(language='en', fp16='false') srt_writer = get_writer("srt", output_directory) Sorry again if it's not the place to ask or if it's something I should be able to figure out myself, but I'm kind of stuck. |
…ng optional (openai#1184) * Add highlight_words, max_line_width, max_line_count * Refactor subtitle generator --------- Co-authored-by: Jong Wook Kim <jongwook@openai.com>
This implementation is based on
word_timestamps
and so it requires that option to be turned on. Word highlighting has also been made optional and turned off by default.Examples:
When
--max_line_count
is specified, subtitles will be segmented at the line limit, or when there is a pause in the speech. This overcomes segmentation artifacts that can occur at window boundaries mid sentence.This segmentation approach works better in conjunction with #1114 because that PR fixes some bugs with timestamp accuracy near segment boundaries where boundary words are stretched to cover pauses, making it harder for the current PR to detect those pauses. In the meantime, the current PR detects pauses by measuring the distance between the start timestamps of successive words rather than between the end timestamp of the previous word and the start timestamp of the current word.