Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding TTS Tutorials #1584

Merged
merged 7 commits into from
Jun 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
272 changes: 272 additions & 0 deletions notebooks/Tutorial_1_use-pretrained-TTS.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "45ea3ef5",
"metadata": {
"tags": []
},
"source": [
"# Easy Inferencing with 🐸 TTS ⚡\n",
"\n",
"#### You want to quicly synthesize speech using Coqui 🐸 TTS model?\n",
"\n",
"💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡\n",
"\n",
"🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .\n",
"\n",
"In this notebook, we will: \n",
"```\n",
"1. List available pre-trained 🐸 TTS models\n",
"2. Run a 🐸 TTS model\n",
"3. Listen to the synthesized wave 📣\n",
"4. Run multispeaker 🐸 TTS model \n",
"```\n",
"So, let's jump right in!\n"
]
},
{
"cell_type": "markdown",
"id": "a1e5c2a5-46eb-42fd-b550-2a052546857e",
"metadata": {},
"source": [
"## Install 🐸 TTS ⬇️"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa2aec77",
"metadata": {},
"outputs": [],
"source": [
"! pip install -U pip\n",
"! pip install TTS"
]
},
{
"cell_type": "markdown",
"id": "8c07a273",
"metadata": {},
"source": [
"## ✅ List available pre-trained 🐸 TTS models\n",
"\n",
"Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures. \n",
"\n",
"You can either use your own model or the release models under 🐸TTS.\n",
"\n",
"Use `tts --list_models` to find out the availble models.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "608d203f",
"metadata": {},
"outputs": [],
"source": [
"! tts --list_models"
]
},
{
"cell_type": "markdown",
"id": "ed9dd7ab",
"metadata": {},
"source": [
"## ✅ Run a 🐸 TTS model\n",
"\n",
"#### **First things first**: Using a release model and default vocoder:\n",
"\n",
"You can simply copy the full model name from the list above and use it \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc9e4608-16ec-4dcd-bd6b-bd10d62286f8",
"metadata": {},
"outputs": [],
"source": [
"!tts --text \"hello world\" \\\n",
"--model_name \"tts_models/en/ljspeech/glow-tts\" \\\n",
"--out_path output.wav\n"
]
},
{
"cell_type": "markdown",
"id": "0ca2cb14-1aba-400e-a219-8ce44d9410be",
"metadata": {},
"source": [
"## 📣 Listen to the synthesized wave 📣"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5fe63ef4-9284-4461-9dda-1ca7483a8f9b",
"metadata": {},
"outputs": [],
"source": [
"import IPython\n",
"IPython.display.Audio(\"output.wav\")"
]
},
{
"cell_type": "markdown",
"id": "5e67d178-1ebe-49c7-9a47-0593251bdb96",
"metadata": {},
"source": [
"### **Second things second**:\n",
"\n",
"🔶 A TTS model can be either trained on a single speaker voice or multispeaker voices. This training choice is directly reflected on the inference ability and the available speaker voices that can be used to synthesize speech. \n",
"\n",
"🔶 If you want to run a multispeaker model from the released models list, you can first check the speaker ids using `--list_speaker_idx` flag and use this speaker voice to synthesize speech."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87b18839-f750-4a61-bbb0-c964acaecab2",
"metadata": {},
"outputs": [],
"source": [
"# list the possible speaker IDs.\n",
"!tts --model_name \"tts_models/en/vctk/vits\" \\\n",
"--list_speaker_idxs \n"
]
},
{
"cell_type": "markdown",
"id": "c4365a9d-f922-4b14-88b0-d2b22a245b2e",
"metadata": {},
"source": [
"## 💬 Synthesize speech using speaker ID 💬"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "52be0403-d13e-4d9b-99c2-c10b85154063",
"metadata": {},
"outputs": [],
"source": [
"!tts --text \"Trying out specific speaker voice\"\\\n",
"--out_path spkr-out.wav --model_name \"tts_models/en/vctk/vits\" \\\n",
"--speaker_idx \"p341\""
]
},
{
"cell_type": "markdown",
"id": "894a560a-f9c8-48ce-aaa6-afdf516c01f6",
"metadata": {},
"source": [
"## 📣 Listen to the synthesized speaker specific wave 📣"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed485b0a-dfd5-4a7e-a571-ebf74bdfc41d",
"metadata": {},
"outputs": [],
"source": [
"import IPython\n",
"IPython.display.Audio(\"spkr-out.wav\")"
]
},
{
"cell_type": "markdown",
"id": "84636a38-097e-4dad-933b-0aeaee650e92",
"metadata": {},
"source": [
"🔶 If you want to use an external speaker to synthesize speech, you need to supply `--speaker_wav` flag along with an external speaker encoder path and config file, as follows:"
]
},
{
"cell_type": "markdown",
"id": "cbdb15fa-123a-4282-a127-87b50dc70365",
"metadata": {},
"source": [
"First we need to get the speaker encoder model, its config and a referece `speaker_wav`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e54f1b13-560c-4fed-bafd-e38ec9712359",
"metadata": {},
"outputs": [],
"source": [
"!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json\n",
"!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar\n",
"!wget https://github.com/coqui-ai/TTS/raw/speaker_encoder_model/tests/data/ljspeech/wavs/LJ001-0001.wav"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6dac1912-5054-4a68-8357-6d20fd99cb10",
"metadata": {},
"outputs": [],
"source": [
"!tts --model_name tts_models/multilingual/multi-dataset/your_tts \\\n",
"--encoder_path model_se.pth.tar \\\n",
"--encoder_config config_se.json \\\n",
"--speaker_wav LJ001-0001.wav \\\n",
"--text \"Are we not allowed to dim the lights so people can see that a bit better?\"\\\n",
"--out_path spkr-out.wav \\\n",
"--language_idx \"en\""
]
},
{
"cell_type": "markdown",
"id": "92ddce58-8aca-4f69-84c3-645ae1b12e7d",
"metadata": {},
"source": [
"## 📣 Listen to the synthesized speaker specific wave 📣"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc889adc-9c71-4232-8e85-bfc8f76476f4",
"metadata": {},
"outputs": [],
"source": [
"import IPython\n",
"IPython.display.Audio(\"spkr-out.wav\")"
]
},
{
"cell_type": "markdown",
"id": "29101d01-0b01-4153-a216-5dae415a5dd6",
"metadata": {},
"source": [
"## 🎉 Congratulations! 🎉 You now know how to use a TTS model to synthesize speech! \n",
"Follow up with the next tutorials to learn more adnavced material."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading