-
Notifications
You must be signed in to change notification settings - Fork 509
how to choose different audio output format
The TTS service supports various audio format. The full list is described Audio format
You can do that as the doc, using a HTTP header to specify the format string.
A sample code is here
// Sets the synthesis output format.
// The full list of supported format can be found here:
// https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3);
The audio format could vary in the bitrate and also streaming vs non-streaming.
In streaming mode, the service will send the first audio bytes as soon as possible. the client can start to render when it caches enough data to render. This can reduce user percieved latency siganificantly. While in non streaming mode, the service will send all audio data until it finishes synthesizing.
Streaming format:
- raw-16khz-16bit-mono-pcm
- raw-8khz-8bit-mono-mulaw
- raw-16khz-16bit-mono-truesilk
- audio-16khz-128kbitrate-mono-mp3
- audio-16khz-64kbitrate-mono-mp3
- audio-16khz-32kbitrate-mono-mp3
- audio-24khz-160kbitrate-mono-mp3
- audio-24khz-96kbitrate-mono-mp3
- audio-24khz-48kbitrate-mono-mp3
- ogg-24khz-16bit-mono-opus
the truesilk format needs to be decoded using silk codec. we will publish samples later.
Non streaming format:
- riff-16khz-16bit-mono-pcm
- riff-8khz-8bit-mono-mulaw
Full format list is here https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech
If you want more format to be supported, feel free to open issues to track!
- Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
- Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!