Skip to content
Eren Golge edited this page Feb 12, 2019 · 7 revisions

TTS is a deep learning based text2speech solution. It favors simplicity over complex and large models to be cost-friendly. However, we still try to achieve state of the art results.

Our initial model is based on Tacotron. Now, we applied various updates to make things better. It is enough to train TTS a day long on a laptop GPU to achieve acceptable results.

TTS is able to give on par or better performance compared to other open-sourced text2speech solutions that we experimented. It is also capable of learning different languages including Chinese, with very little changes.

Below is a blueprint of our architecture.

(thanks to @yweweler)

Clone this wiki locally