Process with a VTT or SRT in realtime or not #140

ROBERT-MCDOWELL · 2024-10-04T15:38:56Z

It would be fantastic to use RealtimeTTS from a VTT or SRT file (or other subtitle formats) to let the engine respect the start time of each segment, so as this we can have a direct audio translation in realtime audio or recorded on an audio file (aac, wav or mp3 for example)
Unless it's already possible to do it?

KoljaB · 2024-10-04T16:53:14Z

https://github.com/KoljaB/TurnVoice/blob/main/turnvoice%2Fcore%2Fsynthesis.py#L272

This does something very similar.
I think the idea to process VTT and SRT is great. But hard to do in real-time. Might more be an add-on project.

ROBERT-MCDOWELL · 2024-10-04T20:06:21Z

well, even if it's not realtime it will help a lot already ;). I'm working on it for now but my biggest issue is to make a dummy device working as my computer does not have any soundcard....
how you could use synthesis.py in the VTT/SRT context?

KoljaB · 2024-10-04T20:54:09Z

I'd parse the file for lengths to get the duration and put this as desired_duration parameter to the synthesize_duration method. So I get the text spoken in the correct time. Fill up with silence for the parts where nothing is spoken and you're good I guess.

KoljaB · 2024-10-04T20:58:16Z

It's hard to make this realtime. Because the final duration of the synthesis generation is unknown beforehand (especially with neural TTS engines with a nondeterministic synthesis output) we testsynthesize here, measure the duration of the result and apply a speed correction factor afterwards. So we stretch the audio in place. But we need the full audio generated to do this, that's far away from realtime.

ROBERT-MCDOWELL · 2024-10-04T21:37:26Z

oh my! sorry I just realized the link you sent is another repo. turnvoice is already a very good start indeed!
about realtime, indeed only pre chunks can do the trick, it won't be realtime but a kind of 1 to 3 sec latency. anyhow even in a presential meeting with human translator there is always a latency ;).

ROBERT-MCDOWELL · 2024-10-05T00:00:43Z

@KoljaB I opened a new discussion on turnvoice repo to discuss about vtt/srt import as I think it's a better repo to add an option to import SRT/VTT rather than video/audio then bypass STT, translation, and keep TTS as the only process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process with a VTT or SRT in realtime or not #140

Process with a VTT or SRT in realtime or not #140

ROBERT-MCDOWELL commented Oct 4, 2024

KoljaB commented Oct 4, 2024

ROBERT-MCDOWELL commented Oct 4, 2024 •

edited

Loading

KoljaB commented Oct 4, 2024

KoljaB commented Oct 4, 2024

ROBERT-MCDOWELL commented Oct 4, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 5, 2024 •

edited

Loading

Process with a VTT or SRT in realtime or not #140

Process with a VTT or SRT in realtime or not #140

Comments

ROBERT-MCDOWELL commented Oct 4, 2024

KoljaB commented Oct 4, 2024

ROBERT-MCDOWELL commented Oct 4, 2024 • edited Loading

KoljaB commented Oct 4, 2024

KoljaB commented Oct 4, 2024

ROBERT-MCDOWELL commented Oct 4, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 5, 2024 • edited Loading

ROBERT-MCDOWELL commented Oct 4, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 4, 2024 •

edited

Loading

ROBERT-MCDOWELL commented Oct 5, 2024 •

edited

Loading