TTS API

RestFUL api and web interface to serve matcha TTS models

Installation

The requirements are tested for python 3.10. In order for matcha TTS to work, some dependencies should be installed.

Update your system's package list and install the required packages for building eSpeak and general utilities:

sudo apt update && sudo apt install -y \
    build-essential \
    autoconf \
    automake \
    libtool \
    pkg-config \
    git \ 
    wget \
    cmake

Clone the eSpeak-ng repository and build it:

git clone https://github.com/espeak-ng/espeak-ng
cd espeak-ng && \
 sudo ./autogen.sh && \
 sudo ./configure --prefix=/usr && \
 sudo make && \
 sudo make install

Later simply:

python -m pip install --upgrade pip

Note

The model best_model.onnx is requiered, you have to download by yourself.

Download the model from HuggingFace https://huggingface.co/projecte-aina/matxa-tts-cat-multiaccent/resolve/main/matcha_multispeaker_cat_all_opset_15_10_steps.onnx

Note: You will need a Huggingface account because the model privacity is setted to gated.

Rename the onnx model to best_model.onnx and move it to ./models/matxa_onnx folder

or download using wget

wget --header="Authorization: Bearer REPLACE_WITH_YOUR_HF_TOKEN" https://huggingface.co/projecte-aina/matxa-tts-cat-multiaccent/resolve/main/matxa_multiaccent_wavenext_e2e.onnx -O ./models/matxa_onnx/best_model.onnx

Launch

tts-api uses FastAPI and uvicorn under the hood. For now, in order to launch:

python server/server.py --model_path models/matxa_onnx/best_model.onnx --port 8001

that receives the calls from 0.0.0.0:8001, or simply

python server/server.py

which gets the calls from 0.0.0.0:8000 by default

Usage

tts-api has three inference endpoints, two openapi ones (as can be seen via /docs)

/api/tts: main inference endpoint

The example for /api/tts can be found in /docs. For the api/tts the call is as the following:

curl --location --request POST 'http://localhost:8000/api/tts' --header 'Content-Type: application/json' --data-raw '{
    "voice": "quim",
    "type": "text",
    "text": "El Consell s’ha reunit avui per darrera vegada abans de les eleccions. Divendres vinent, tant el president com els consellers ja estaran en funcions. A l’ordre del dia d’avui tampoc no hi havia l’aprovació del requisit lingüístic, és a dir la normativa que ha de regular la capacitació lingüística dels aspirants a accedir a un lloc en la Funció Pública Valenciana.",
    "language": "ca-es" }' --output tts.wav

Docker launch from the hub

To launch using lastest version available on the Dockerhub:

docker run -p 8000:8000 projecteaina/tts-api:latest

Check out the documentation available on the Dockerhub

Docker build and launch

To build:

docker build -t tts-api .

To launch:

docker run -p 8000:8000 tts-api

The default entrypoint puts the web interface to http://0.0.0.0:8000/.

Develop in docker

You can run this api with docker with reload mode that will let you watch you local changes on api.

To run in dev mode run the following command.

make dev

REST API Endpoints

Method	Endpoint	Description
POST	`/api/tts`	Generate speech audio from text using TTS.

Request Parameters:

Parameter	Type	Description
language	string	ISO language code (e.g., "ca-es", "ca-ba", "ca-nw", "ca-va")
voice	string	Name of the voice to use
type	string	Type of input text ("text" or "ssml")
text	string	Text to be synthesized (if type is "ssml", enclose in tags)

NOTES:

ssml format is not available yet.

Successful Response:

The endpoint returns a streaming response that contains the synthesized speech audio in WAV format.

Sample Request:

POST /api/tts

{
  "voice": "speaker_id",
  "text": "Bon dia!",
  "type": "text"
}

Command line deployment arguments

Argument	Type	Default	Description
speech_speed	float	1.0	Change the speech speed.

The "speech_speed" argument refers to a parameter that adjusts the rate at which speech sounds in an audio output, with higher values resulting in faster speech, and lower values leading to slower speech.

Deployment

Environment Variables

To deploy this project, you will need to add the following environment variables to your .env file

SPEECH_SPEED

USE_CUDA

HF_TOKEN #Required if you build the docker image from this repository, you need a Huggingface token to download the tts model.

Example of .env file

SPEECH_SPEED=1.0
USE_CUDA=False
HF_TOKEN=REPLACE_WITH_YOUR_HUGGINGFACE_TOKEN

Deployment via docker compose

Prerequisites

To deploy this app

make deploy

To deploy this app using GPU

make deploy-gpu

To stop deployment run

make stop

To delete deployment run

make undeploy

Deployment via Helm

The chart is still not available on any repository so you need to run this command from the repository folder. Please, keep in mind that if you are deploying this chart to a cloud K8s instance you need to push the Docker image first to an image registry.

Create namespace

kubectl create namespace apps

Deploy chart

#Run the following command from $BASE_REPO_PATH/charts/aina-tts-api path
helm upgrade --install aina-tts-api --create-namespace .

You can either change the values on values.yaml or override them.

helm upgrade --install aina-tts-api --create-namespace \
  --set global.namespace=apps \
  --set api.image=tts-api \
  --set api.tag=latest .

Deploy helm chart with a different speech speed value

helm upgrade --install aina-tts-api --create-namespace \
  --set api.speech_speed=1.6 .

Authors and acknowledgment

Developed by the Language Technologies Unit in Barcelona Supercomputing Center. The code is based on Coqui TTS server.py that has a Mozilla Public License 2.0.

License

Mozilla Public License 2.0

Project status

Funding

This work is funded by the Generalitat de Catalunya within the framework of Projecte AINA.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github/workflows		.github/workflows
charts/aina-tts-api		charts/aina-tts-api
models/matxa_onnx		models/matxa_onnx
scripts		scripts
server		server
text		text
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
Dockerfile.test		Dockerfile.test
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose-dev.yml		docker-compose-dev.yml
docker-compose-gpu.yml		docker-compose-gpu.yml
docker-compose-test.yml		docker-compose-test.yml
docker-compose.yml		docker-compose.yml
infer_wavenext_onnx.py		infer_wavenext_onnx.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTS API

Installation

Launch

Usage

Docker launch from the hub

Docker build and launch

Develop in docker

REST API Endpoints

Command line deployment arguments

Deployment

Environment Variables

Deployment via docker compose

Prerequisites

Deployment via Helm

Authors and acknowledgment

License

Project status

Funding

About

Releases 1

Packages

Contributors 6

Languages

License

projecte-aina/tts-api

Folders and files

Latest commit

History

Repository files navigation

TTS API

Installation

Launch

Usage

Docker launch from the hub

Docker build and launch

Develop in docker

REST API Endpoints

Command line deployment arguments

Deployment

Environment Variables

Deployment via docker compose

Prerequisites

Deployment via Helm

Authors and acknowledgment

License

Project status

Funding

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages