Fix to Timing Overlap Issue #608 and #621 #816

FredBill1 · 2024-06-02T06:42:39Z

Currently the function iterate_subtitles does not compute the start and end time correctly for each combined subtitle segment. All it does is selecting the start and end time of the first segement in the combined segement, without considering the following segments or segment breaks. The start and end time of each word is ignored, causing the --max_line_count and --max_line_width to return segements with overlapped timing, as stated in issue #608 and #621.

whisperX/whisperx/utils.py

Line 274 in f2da2f8

times.append((segment["start"], segment["end"], segment.get("speaker")))

whisperX/whisperx/utils.py

Lines 281 to 284 in f2da2f8

    
           for subtitle, _ in iterate_subtitles(): 
        
               sstart, ssend, speaker = _[0] 
        
               subtitle_start = self.format_timestamp(sstart) 
        
               subtitle_end = self.format_timestamp(ssend)

whisperX/whisperx/utils.py

Line 316 in f2da2f8

yield subtitle_start, subtitle_end, prefix + subtitle_text

I added a piece of code to compute the start and end time of the combined segment based on the time of each word in the segment. But I'm not sure how to utilize the times yielded by iterate_subtitles.

fix timing overlap issue (m-bain#816)

issaMbarki · 2024-10-09T19:58:29Z

I edited the iterate_subtitles function and it gave me really good result now with max_line_count and max_line_width, once I'll have sometime I'll create a pull request

fix timing overlap issue

faff50a

tylerjthomas9 added a commit to SaguaroCapital/whisperX that referenced this pull request Jul 2, 2024

Merge pull request #1 from FredBill1/fix-timing-overlap

a832332

fix timing overlap issue (m-bain#816)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to Timing Overlap Issue #608 and #621 #816

Fix to Timing Overlap Issue #608 and #621 #816

FredBill1 commented Jun 2, 2024

issaMbarki commented Oct 9, 2024

	for subtitle, _ in iterate_subtitles():
	sstart, ssend, speaker = _[0]
	subtitle_start = self.format_timestamp(sstart)
	subtitle_end = self.format_timestamp(ssend)

Fix to Timing Overlap Issue #608 and #621 #816

Are you sure you want to change the base?

Fix to Timing Overlap Issue #608 and #621 #816

Conversation

FredBill1 commented Jun 2, 2024

issaMbarki commented Oct 9, 2024