RestFUL api and web interface to serve matcha TTS models
The requirements are tested for python 3.10. In order for matcha TTS to work, some dependencies should be installed.
- Update your system's package list and install the required packages for building eSpeak and general utilities:
sudo apt update && sudo apt install -y \
build-essential \
autoconf \
automake \
libtool \
pkg-config \
git \
wget \
cmake
- Clone the eSpeak-ng repository and build it:
git clone https://github.com/espeak-ng/espeak-ng
cd espeak-ng && \
sudo ./autogen.sh && \
sudo ./configure --prefix=/usr && \
sudo make && \
sudo make install
Later simply:
python -m pip install --upgrade pip
Note
The model best_model.onnx is requiered, you have to download by yourself.
Download the model from HuggingFace https://huggingface.co/projecte-aina/matxa-tts-cat-multiaccent/resolve/main/matcha_multispeaker_cat_all_opset_15_10_steps.onnx
Note: You will need a Huggingface account because the model privacity is setted to gated.
Rename the onnx model to best_model.onnx and move it to ./models/matxa_onnx folder
or download using wget
wget --header="Authorization: Bearer REPLACE_WITH_YOUR_HF_TOKEN" https://huggingface.co/projecte-aina/matxa-tts-cat-multiaccent/resolve/main/matxa_multiaccent_wavenext_e2e.onnx -O ./models/matxa_onnx/best_model.onnx
tts-api uses FastAPI
and uvicorn
under the hood. For now, in order to launch:
python server/server.py --model_path models/matxa_onnx/best_model.onnx --port 8001
that receives the calls from 0.0.0.0:8001
, or simply
python server/server.py
which gets the calls from 0.0.0.0:8000
by default
tts-api has three inference endpoints, two openapi ones (as can be seen via /docs
)
/api/tts
: main inference endpoint
The example for /api/tts
can be found in /docs
. For the api/tts
the call is as the following:
curl --location --request POST 'http://localhost:8000/api/tts' --header 'Content-Type: application/json' --data-raw '{
"voice": "quim",
"type": "text",
"text": "El Consell s’ha reunit avui per darrera vegada abans de les eleccions. Divendres vinent, tant el president com els consellers ja estaran en funcions. A l’ordre del dia d’avui tampoc no hi havia l’aprovació del requisit lingüístic, és a dir la normativa que ha de regular la capacitació lingüística dels aspirants a accedir a un lloc en la Funció Pública Valenciana.",
"language": "ca-es" }' --output tts.wav
To launch using lastest version available on the Dockerhub:
docker run -p 8000:8000 projecteaina/tts-api:latest
Check out the documentation available on the Dockerhub
To build:
docker build -t tts-api .
To launch:
docker run -p 8000:8000 tts-api
The default entrypoint puts the web interface to http://0.0.0.0:8000/
.
You can run this api with docker with reload mode that will let you watch you local changes on api.
To run in dev mode run the following command.
make dev
Method | Endpoint | Description |
---|---|---|
POST | /api/tts |
Generate speech audio from text using TTS. |
Request Parameters:
Parameter | Type | Description |
---|---|---|
language | string | ISO language code (e.g., "ca-es", "ca-ba", "ca-nw", "ca-va") |
voice | string | Name of the voice to use |
type | string | Type of input text ("text" or "ssml") |
text | string | Text to be synthesized (if type is "ssml", enclose in tags) |
NOTES:
- ssml format is not available yet.
Successful Response:
The endpoint returns a streaming response that contains the synthesized speech audio in WAV format.
Sample Request:
POST /api/tts
{
"voice": "speaker_id",
"text": "Bon dia!",
"type": "text"
}
Argument | Type | Default | Description |
---|---|---|---|
speech_speed | float | 1.0 | Change the speech speed. |
- The "speech_speed" argument refers to a parameter that adjusts the rate at which speech sounds in an audio output, with higher values resulting in faster speech, and lower values leading to slower speech.
To deploy this project, you will need to add the following environment variables to your .env file
SPEECH_SPEED
USE_CUDA
HF_TOKEN
#Required if you build the docker image from this repository, you need a Huggingface token to download the tts model.
Example of .env file
SPEECH_SPEED=1.0
USE_CUDA=False
HF_TOKEN=REPLACE_WITH_YOUR_HUGGINGFACE_TOKEN
-
Make
To deploy this app
make deploy
To deploy this app using GPU
make deploy-gpu
To stop deployment run
make stop
To delete deployment run
make undeploy
The chart is still not available on any repository so you need to run this command from the repository folder. Please, keep in mind that if you are deploying this chart to a cloud K8s instance you need to push the Docker image first to an image registry.
Create namespace
kubectl create namespace apps
Deploy chart
#Run the following command from $BASE_REPO_PATH/charts/aina-tts-api path
helm upgrade --install aina-tts-api --create-namespace .
You can either change the values on values.yaml
or override them.
helm upgrade --install aina-tts-api --create-namespace \
--set global.namespace=apps \
--set api.image=tts-api \
--set api.tag=latest .
Deploy helm chart with a different speech speed value
helm upgrade --install aina-tts-api --create-namespace \
--set api.speech_speed=1.6 .
Developed by the Language Technologies Unit in Barcelona Supercomputing Center. The code is based on Coqui TTS server.py that has a Mozilla Public License 2.0.
Mozilla Public License 2.0
- Conteinerized
- Improved endpoints
- Improved models
- Speed control
- Caching
This work is funded by the Generalitat de Catalunya within the framework of Projecte AINA.