Skip to content

How Neural TTS works

szhaomsft edited this page Dec 28, 2019 · 7 revisions


Neural TTS is the latest breakthrough of text to speech technology. The Azure neural TTS team first released Neural TTS based production service in 2018.9.

Our text-to-speech capability uses deep neural networks to overcome the limits of traditional text-to-speech systems in matching the patterns of stress and intonation in spoken language, called prosody, and in synthesizing the units of speech into a computer voice.

By using the computational power of Azure, we can deliver real-time streaming, which is useful for situations such as interacting with a chatbot or virtual assistant. The capability is served in the Azure Kubernetes Service. This ensures high scalability and availability and gives customers the ability to use neural text-to-speech and traditional text-to-speech from a single endpoint.


Neural TTS research is active in last 3 years. The team keeps pushing state of art on neural TTS research front.
A few selected papers: