You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is very probably because CPU inference is too slow to do realtime XTTS synthesis. Even a lot of GPUs are. I don't think BufferStream has anything to do with is.
But I can do real time inferencing using the coqui engine and there is no interference like this one is having. When I use the Coqui engine, it runs completely fine.
I am not using XTTS either, the one I am using is way smaller than that
It's tts_models/en/vctk/vits
Isn't there a way to fix this?
Maybe the issue is that we are applying rvc on the audio chunk rather than full audio of the whole sentence
can you please update the example_rvc/xtts_rvc_synthesizer.py to utilize TextToAudioStream rather than using BufferStream.
because for cpu users, the audio is very oddly being played.
Recording.2024-10-06.172841.mp4
as you can see in the above video
The text was updated successfully, but these errors were encountered: