How to call multiple voice in SSML

Customer may want to use multiple voices in one SSML to deliver some interesting experiences like role play story telling. Azure TTS support combing multiple voices with SSML.

Multiple Prebuild Voices

To use multiple prebuild voices, one should have SSML composed to refer to the voices to be used.

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        This is the text that is spoken.
    </voice>
    <voice name="en-US-GuyNeural">
        This is the text that is spoken.
    </voice>
</speak>

then everything is the same like SSML with single voice.

Multiple Custom Voices

For custom voice, currently the custom endpoint needs to have the custom voice deployment id. Refer to:

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions#custom-voices

To access multiple custom voices in the SSML like above, each voice need to be deployed into their own endpoint. Then use multiple deploymentId parameter in endpoint URL to specify the voices needed.

For example:

If there are 3 voices deployed in custom voice into 3 endpoints

Voice Name	Endpoint URL
VoiceA	https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=44aa21a9-56cb-4959-b4c8-91a14a68b0b2
VoiceB	https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=da90e300-3f79-462e-85cc-dac44b44ad33
VoiceC	https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=3655bafe-073a-4291-aae7-7d2e7160b0f6

Combine their deploymentId into one URL to access the 3 voices in one endpoint: https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=44aa21a9-56cb-4959-b4c8-91a14a68b0b2&deploymentId=da90e300-3f79-462e-85cc-dac44b44ad33&deploymentId=3655bafe-073a-4291-aae7-7d2e7160b0f6

With the following SSML:

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="VoiceA">
        This is VoiceA.
    </voice>
    <voice name="VoiceB">
        Then VoiceB.
    </voice>
    <voice name="VoiceC">
        And this is VoiceC.
    </voice>
</speak>

All endpoints need to be in the same subscription, voices can be in different languages.

All deploymentId must be valid. Any invalid ID will fail the request.

If there are too many voices (more than 10) to put into the URL, it is recommended to have some code to construct the URL dynamically based on the SSML content.

Other Notes

Currently we don't support to mix custom voice and prebuild voice in one SSML

Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!

Azure Speech Document

Create Custom Neural Voice

Speech SDK

Azure Speech Containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to call multiple voice in SSML

Multiple Prebuild Voices

Multiple Custom Voices

Other Notes

Clone this wiki locally