coqui-ai · erogol · Jun 2, 2022 · May 13, 2022 · May 20, 2022 · May 24, 2022
diff --git a/notebooks/use-pretrained-TTS.ipynb b/notebooks/use-pretrained-TTS.ipynb
@@ -0,0 +1,198 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "45ea3ef5",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# Easy inferencing with 🐸 TTS ⚡\n",
+    "\n",
+    "You want to quicly synthesize speech using Coqui (🐸) TTS model?\n",
+    "\n",
+    "💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡\n",
+    "\n",
+    "🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .\n",
+    "\n",
+    "\n",
+    "In this notebook, we will:\n",
+    "\n",
+    "1. Download a pre-trained TTS english model.\n",
+    "\n",
+    "\n",
+    "So, let's jump right in!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fa2aec77",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## Install Coqui STT\n",
+    "! pip install -U pip\n",
+    "! pip install TTS"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c07a273",
+   "metadata": {},
+   "source": [
+    "## ✅ List available pre-trained 🐸 TTS models\n",
+    "\n",
+    "Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures. \n",
+    "\n",
+    "You can either use your own model or the release models under 🐸TTS.\n",
+    "\n",
+    "Use `tts --list_models` to find out the availble models.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "608d203f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! tts --list_models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed9dd7ab",
+   "metadata": {},
+   "source": [
+    "## ✅ Run a TTS model\n",
+    "\n",
+    "### **First things first**: Using a release model and default vocoder:\n",
+    "\n",
+    "#### You can simply copy the full model name from the list above and use it \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cc9e4608-16ec-4dcd-bd6b-bd10d62286f8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tts --text \"hello world\" \\\n",
+    "--model_name \"tts_models/en/ljspeech/glow-tts\" \\\n",
+    "--out_path output.wav\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ca2cb14-1aba-400e-a219-8ce44d9410be",
+   "metadata": {},
+   "source": [
+    "## 📣 Listen to the synthesized wave 📣"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5fe63ef4-9284-4461-9dda-1ca7483a8f9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import IPython\n",
+    "IPython.display.Audio(\"output.wav\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e67d178-1ebe-49c7-9a47-0593251bdb96",
+   "metadata": {},
+   "source": [
+    "### **Second things second**:\n",
+    "\n",
+    "#### If you want to run a multispeaker model from the released models list, you can first check the speaker ids using `--list_speaker_idx` flag and use this speaker voice to synthesize speech."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "87b18839-f750-4a61-bbb0-c964acaecab2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# list the possible speaker IDs.\n",
+    "!tts --model_name \"tts_models/en/vctk/vits\" \\\n",
+    "--list_speaker_idxs \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4365a9d-f922-4b14-88b0-d2b22a245b2e",
+   "metadata": {},
+   "source": [
+    "## 💬 Synthesize speech using speaker ID 💬"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "52be0403-d13e-4d9b-99c2-c10b85154063",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tts --text \"Trying out specific speaker voice\"\\\n",
+    "--out_path spkr-out.wav --model_name \"tts_models/en/vctk/vits\" \\\n",
+    "--speaker_idx \"p341\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "894a560a-f9c8-48ce-aaa6-afdf516c01f6",
+   "metadata": {},
+   "source": [
+    "## 📣 Listen to the synthesized speaker specific wave 📣"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed485b0a-dfd5-4a7e-a571-ebf74bdfc41d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import IPython\n",
+    "IPython.display.Audio(\"spkr-out.wav\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c9116400-aff7-4a04-810f-7f89e66d2950",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}