This API is a wrapper to openai's text-to-speech API.
Use this to expose an API to dynamically generate audio clips, saved in your own S3 bucket.
This project spins up an API Gateway with Lambda proxy integration and reveals an endpoint to which you may send POST
requests with some text you wish to get back as a text-to-speech audio file.
- Serverless API and functions
- Local development possible with AWS SAM
- Reproducible infrastructure through Terraform
- 100% of AWS resources available in AWS free tier
- 1 audio file costs 1 cent for a 20-40 word input (requires openai API key)
- S3 bucket objects are secure by default, objects not accessible without a signed URL
- Signed URLs are generated only during the time that it is requested, which is immediately after the HTTP request
- (Dec 2023) The API is not secured (as you do not need an authentication token at the API Gateway level). It is your responsibility to add security layers to this infrastructure
- An AWS account
- An openai API key
- AWS CLI
- The AWS SAM CLI (local development)
- For local invocation, Docker is also required
- Note: This project was developed on Apple silicon, so relevant areas have arm64 architecture designated. Please change this as required. You may have to create a x86_64 openai layer if you wish to switch everything over to x86_64.
- Go to secrets manager
- Select 'Store a new secret'
- Populate your secret as
OPENAI_API_KEY
, and put your created openai key in the value field - Save it with a unique secret name. You will use this created name to populate your terraform variables.
- In
tf-resources
, you will see avariables.tf
file. - Initialize these variables by creating a file,
terraform.tfvars
- Populate these variables like so:
aws_region = "us-west-2"
profile = "default"
openai_layer_abs_path = "/Users/dev/src/openai-layer.zip"
s3_bucket_name = "dev-bucket"
source_code_path = "/Users/dev/src/text_to_voice"
secrets_manager_openai_api_secret_name = "dev/openai/openai_secret"
terraform init
terraform plan
terraform apply
terraform apply
outputs a URL similar to https://ljdsfsd7.execute-api.us-west-2.amazonaws.com/dev/
Example request
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"output_name":"my_filename",
"text_to_read":"hello world",
"voice_type":"echo"
}' \
https://ljdsfsd7.execute-api.us-west-2.amazonaws.com/dev/tts | \
jq . | \
pygmentize -l json
Example response:
{
"statusCode": 200,
"body": {
"message": "Upload file succeeded",
"file_url": "https://dev-bucket.s3.amazonaws.com/hello_echo.mp3?AWSAccessKeyId=AKIATVKDXKJZ6&Signature=bazFe6RVL4VcWBASzREzrUBZovk%3D&Expires=1701593603"
}
}
The deployed API Gateway accepts a POST request with 3 parameters:
{
"text_to_read": <text_to_read>,
"voice_type": <one of ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']>,
"output_name": <file_name>
}
Lambda routes the <text_to_read>
to openai, retrieves the mp3 where the text is spoken in <voice_type>
, stores it in an S3 bucket as <output_name.mp3>
, and finally generates a signed URL to the audio file to which you may access for 1 hour.
- When developing, run
sam build
to build your latest changes
CLI - Write response to output.txt
aws lambda invoke \
--function-name my-function \
--cli-binary-format raw-in-base64-out \
--payload '{"key1": "value1", "key2": "value2", "key3": "value3"}' output.txt
https://aws.amazon.com/blogs/compute/better-together-aws-sam-cli-and-hashicorp-terraform/