Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct timestamps when merging fragmented WebVTT #67

Merged

Conversation

Shivelight
Copy link
Contributor

@Shivelight Shivelight commented Nov 17, 2023

Some services stream their WebVTT subtitle track in segments with relative timestamps, resulting in a broken subtitle file (either no captions/empty file, timestamps will be offset wrong very close to one another, or identical to one another). This PR introduces a function to handle such concatenated WebVTT subtitles and fix the timestamps.

Example 1: Segmented WebVTT [DASH]

Example (1) Input (merged by devine):

WEBVTT

00:00:01.791 --> 00:00:04.000
I want to become rich instantly.


WEBVTT

00:00:00.000 --> 00:00:01.336
I want to become rich instantly.

00:00:01.419 --> 00:00:04.000
Free from personal debts and family debts.


WEBVTT

00:00:00.000 --> 00:00:01.799
Free from personal debts and family debts.

00:00:02.466 --> 00:00:04.000
I want to get a good job.


WEBVTT

00:00:00.000 --> 00:00:01.135
I want to get a good job.

00:00:01.177 --> 00:00:04.000
Just not in a morgue.

Example (1) Output (Fixed):

WEBVTT

01:37.791 --> 01:41.336
I want to become rich instantly.

01:41.419 --> 01:45.799
Free from personal debts and family debts.

01:46.466 --> 01:49.135
I want to get a good job.

01:49.177 --> 01:52.138
Just not in a morgue.
Example 2: Segmented WebVTT [HLS]

Example (2) Input with HLS X-TIMESTAMP-MAP extension (merged by devine):

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:10800000,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:01.336
I want to become rich instantly.

00:00:01.419 --> 00:00:04.000
Free from personal debts and family debts.


WEBVTT
X-TIMESTAMP-MAP=MPEGTS:11160000,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:01.799
Free from personal debts and family debts.

00:00:02.466 --> 00:00:04.000
I want to get a good job.


WEBVTT
X-TIMESTAMP-MAP=MPEGTS:11520000,LOCAL:00:00:00.000

00:00:00.000 --> 00:00:01.135
I want to get a good job.

00:00:01.177 --> 00:00:04.000
Just not in a morgue.

Example (2) Output (Fixed):

1
00:01:37,791 --> 00:01:41,336
I want to become rich instantly.

2
00:01:41,419 --> 00:01:45,799
Free from personal debts and family debts.

3
00:01:46,466 --> 00:01:49,135
I want to get a good job.

4
00:01:49,177 --> 00:01:52,138
Just not in a morgue.

Most Logic is borrowed from N_m3u8DL-RE project. Ref:

@Shivelight Shivelight marked this pull request as draft November 22, 2023 10:02
@Shivelight Shivelight marked this pull request as ready for review November 23, 2023 08:44
@rlaphoenix rlaphoenix force-pushed the master branch 17 times, most recently from c464a22 to 910a472 Compare December 1, 2023 18:33
@rlaphoenix rlaphoenix force-pushed the feature/fix-webvtt-timestamp branch 4 times, most recently from 3d8a50e to f214f7f Compare December 2, 2023 14:14
@rlaphoenix rlaphoenix force-pushed the master branch 2 times, most recently from 18aa068 to f0b589c Compare March 10, 2024 15:13
@rlaphoenix rlaphoenix force-pushed the master branch 4 times, most recently from 4c87e20 to 10285c3 Compare April 2, 2024 23:58
@rlaphoenix rlaphoenix force-pushed the feature/fix-webvtt-timestamp branch 15 times, most recently from c828eff to 332c114 Compare May 6, 2024 12:26
@devine-dl devine-dl deleted a comment from Shivelight May 6, 2024
@rlaphoenix rlaphoenix force-pushed the feature/fix-webvtt-timestamp branch 2 times, most recently from 229fa0b to d4dce0c Compare May 6, 2024 17:17
This applies the X-TIMESTAMP-MAP data to timestamps as it reads through a concatenated (merged) WebVTT file to correct timestamps on segmented WebVTT streams. It then removes the X-TIMESTAMP-MAP header.

The timescale and segment duration information is saved in the Subtitle's data dictionary under the hls/dash key: timescale (dash-only) and segment_durations. Note that this information will only be available post-download.

This is done regardless if you are converting to another subtitle or not, since the downloader automatically and forcefully concatenated the segmented subtitle data. We do not support the use of segmented Subtitles for downloading or otherwise, nor do we plan to.
@rlaphoenix rlaphoenix force-pushed the feature/fix-webvtt-timestamp branch from d4dce0c to 0ba45de Compare May 6, 2024 17:18
@rlaphoenix rlaphoenix changed the title Add function to fix segmented WebVTT timestamp Correct timestamps when merging fragmented WebVTT May 7, 2024
@rlaphoenix rlaphoenix merged commit 7aa797a into devine-dl:master May 7, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants