Understanding of generation #31

somehowchris · 2024-03-11T16:23:41Z

somehowchris
Mar 11, 2024

Hihi,

I'm trying to understand your implementation. While all of these architectures are already advanced I have 1 point I don't get; it's about the voice prompt as demonstrated by the original paper.

Given your .generate method the text would be convertes to semantic tokens from spear tts, but how does the model know about the speakers profiles? seq get's created within the method, so I don't have the possibility to apply any "conditioning" for the speakers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding of generation #31

{{title}}

Replies: 0 comments

Select a reply

Understanding of generation #31

somehowchris Mar 11, 2024

Replies: 0 comments

somehowchris
Mar 11, 2024