Understanding of generation #31
somehowchris
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hihi,
I'm trying to understand your implementation. While all of these architectures are already advanced I have 1 point I don't get; it's about the voice prompt as demonstrated by the original paper.
Given your
.generate
method the text would be convertes to semantic tokens from spear tts, but how does the model know about the speakers profiles?seq
get's created within the method, so I don't have the possibility to apply any "conditioning" for the speakers?Beta Was this translation helpful? Give feedback.
All reactions