Access all the best text-to-speech AI voices from Google, Amazon, IBM and Microsoft using Play.ht's text-to-speech API. Our AI voice generator provides a single interface to convert text to audio using voices across different providers.
Using a single text-to-speech API in your projects saves you time and offers many benefits:
- You instantly get access to all the voices from Google, Amazon, IBM and Microsoft.
- You maintain only one API integration.
- You don't have to worry about API upgrades or changes made on Google, Amazon, IBM and Microsoft.
- Any new voices added on these platforms are instantly available to you.
Take a look at the Voices reference file to see a list of the available voices and languages. The file also contains audio samples to help you pick.
Note: You need to have a Play.ht account with word credit to be able to access the API.
There are two endpoints on the API that you will use to convert text to speech:
/convert
: Performs the text-to-speech conversion./articleStatus
: Lets you know if the conversion is done.
Since the text-to-speech conversion is an asynchronous process, you will first make a POST
request to the /convert
endpoint with the text and voice, and then make GET
requests to the /articleStatus
endpoint to check if the conversion is done and to get the audio file.
The two endpoints have been described in detail below.
But first, we need authentication!
All endpoints require authentication. Authentication consists of two required HTTPS headers:
Authorization
: This is where your secret key goes.X-User-ID
: This is where your Play.ht user ID goes.
To access your credentials, make sure you're logged-in to your Play.ht account, then visit your API Access page. If you're having issues, you can reach out to us on support [at] play.ht
.
Make sure to store your secret key privately and do not share it. Never use your secret key in the front-end part of your app or in the browser.
- Base URL:
https://play.ht/api/v1/
Notes:
- All endpoints are relative to the base URL.
- Requests should always be in JSON format, with a
Content-Type: application/json
header.
- Endpoint:
./convert
Use this endpoint to start converting an article from text to audio.
-
Method:
POST
-
Body (JSON):
{ "voice": string, "content": string[], "ssml": string[], "title": string, // Optional "narrationStyle": string, // Optional "globalSpeed": string, // Optional "pronunciations": { key: string, value: string }[], // Optional "trimSilence": boolean, // Optional }
voice
is the ID of the voice used to synthesize the text. Refer to the Voices reference file for more details.Only one of
content
orssml
can be passed:-
content
is an array of strings, where each string represents a paragraph in plain text format. -
ssml
is an array of strings, where each string represents a paragraph in SSML format. Learn more about SSML. Not all SSML features are supported with all voices.
title
is a field to name your file. You can use this name to find the audio in your Play.ht dashboard.narrationStyle
is a string representing the tone and accent of the voice to read the text. Make sure the value fornarrationStyle
is supported by the voice in your request. Refer to the Voices reference file for more details.globalSpeed
is a string in the format<number>%
, where<number>
is in the closed interval of[20, 200]
. Use this to speed-up, or slow-down the speaking rate of the speech.pronunciations
is an array of key-value pair objects, wherekey
is the source string (e.g."Play.ht"
), andvalue
is the target pronunciation (e.g."Play dot H T"
). Use this when you want to customize the pronunciation of a certain word/phrase (e.g. your brand name).trimSilence
is a boolean value. When enabled, the audio will be trimmed to remove any silence from the end of the file. -
-
Response (JSON):
{ "status": "transcriping" | "error", "transcriptionId": string, "error": string // Optional }
Use the
transcriptionId
in the response to check the conversion status in the Article status endpoint.
- Endpoint:
./articleStatus?transcriptionId={transcriptionId}
Use this endpoint to check the conversion status of your text using its transcription ID.
If the article (text) is converted to audio, the response will contain the audio file URL along with certain metadata such as voice and narration style. A true
value for error
field indicates a conversion failure.
Where {transcriptionId}
is the ID provided in the successful response of Convert endpoint.
-
Method:
GET
-
Response (JSON):
{ "converted": boolean, "error": boolean, // Optional "errorMessage": string, // Optional "audioUrl": string, // Optional "audioDuration": number, // Optional "voice": string, // Optional "narrationStyle": string, // Optional "globalSpeed": string, // Optional }
Optional fields are only provided when applicable.