Skip to content

how to choose different audio output format

szhaomsft edited this page Feb 7, 2020 · 10 revisions

The TTS service supports various audio format. The full list is described Audio format

Set the audio format using REST

You can do that as the doc, using a HTTP header to specify the format string.

Set the audio format using SDK.

A sample code is here

       // Sets the synthesis output format.
        // The full list of supported format can be found here:
        // https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech#audio-outputs
        config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3);

Streaming vs non Streaming

The audio format could vary in the bitrate and also streaming vs non-streaming.

In streaming mode, the service will send the first audio bytes as soon as possible. the client can start to render when it caches enough data to render. This can reduce user percieved latency siganificantly. While in non streaming mode, the service will send all audio data until it finishes synthesizing.

Streaming format:

  • raw-16khz-16bit-mono-pcm
  • raw-8khz-8bit-mono-mulaw
  • raw-16khz-16bit-mono-truesilk
  • audio-16khz-128kbitrate-mono-mp3
  • audio-16khz-64kbitrate-mono-mp3
  • audio-16khz-32kbitrate-mono-mp3
  • audio-24khz-160kbitrate-mono-mp3
  • audio-24khz-96kbitrate-mono-mp3
  • audio-24khz-48kbitrate-mono-mp3
  • ogg-24khz-16bit-mono-opus

the truesilk format needs to be decoded using silk codec. we will publish samples later.

Non streaming format:

  • riff-16khz-16bit-mono-pcm
  • riff-8khz-8bit-mono-mulaw

Full format list is here https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech

If you want more format to be supported, feel free to open issues to track!

Clone this wiki locally