Telegram Bot and Coqui Improvments #144

zaptrem · 2023-05-24T07:00:36Z

Note: This PR description was generated in part (with lots of re-prompting/editing) with Bing-GPT4! It would be neat to automate this.

This pull request introduces a voice-to-voice Telegram bot that shows off Coqui TTS's prompt-to-voice and (soon) audio-to-voice models. The pull request consists of two main parts:

1. Coqui Synthesizer Changes

The CoquiSynthesizer class now supports asynchronous parallel synthesis of large audio segments using the async_synthesize method, which is added to the BaseSynthesizer class as well. It adds one new dependency (aiohttp) to enable non-blocking http calls, and dependency (SpeechRecognition) that is already in use by the library but wasn't present in poetry.

The async_synthesize method works as follows:

It splits a long text into smaller chunks of less than 250 characters, which is the maximum length that Coqui TTS can handle at once.
It creates a list of tasks for each chunk using asyncio.create_task(), which returns a coroutine object that can be awaited later.
It waits for all tasks to complete using asyncio.gather(), which returns a list of results in the same order as the tasks.
It concatenates and returns the results as an AudioSegment object.

An example of using the async_synthesize method is:

from vocode.turn_based.synthesizer.coqui_synthesizer import CoquiSynthesizer
from pydub import AudioSegment
import asyncio

synth = CoquiSynthesizer()
text = "Insert a long message that needs to be synthesized in chunks of >250 letters."
audio = asyncio.run(synth.async_synthesize(text)) # type: AudioSegment
audio.export("output.wav", format="wav")

2. Prompt-To-Voice Telegram Bot

A demonstration of how Coqui TTS can be integrated with Vocode and telegram to create engaging voice applications.

The bot (based on albertwujj's work) uses the python-telegram-bot library to handle user messages and commands, the WhisperTranscriber class to transcribe voice messages from users, and the ChatGPTAgent class to generate text responses based on a system prompt and the user input. The system prompt is even customized based on the voice name and description of the current voice. The bot also allows the user to select or create different voices using Coqui TTS's voice creation APIs.

The bot supports the following commands for the user to interact with it:

/start: Initializes the user data and sends a welcome message.
/voice <voice_id>: Changes the current voice to the one with the given id and resets the conversation. The voice id must be an integer corresponding to one of the available voices.
/create <voice_description>: Creates a new Coqui TTS voice from a text prompt and switches to it. The voice description must be a string that describes how the voice should sound like.
/list: Lists all the available voices with their ids, names, and descriptions (if any).
/who: Shows the name and description (if any) of the current voice.
/help: Shows a help message with all the available commands.

TODO:

Get feedback on code cleanliness and readability.
Test the changes to CoquiSynthesizer more thoroughly and handle possible errors or edge cases.
Make the bot work on Replit including use of ReplitDB for conversation/voice persistence between instances.
Evaluate if the InMemoryDB wrapper is the best way to handle non-existent users or if there is a better alternative.
Add voice cloning from audio clip feature using Coqui TTS's clone endpoint.
Implement Coqui improvements/fix in streaming synthesizer or create separate issue.
Investigate issue on Coqui's side relating to dropped sentences and/or switch to their new (but slower) model.

Kian1354

overall looks really good! left minor nits/ideas

Kian1354 · 2023-05-24T21:28:34Z

apps/telegram_bot/README.md

@@ -0,0 +1,34 @@
+# client_backend


nit, there are some references to client backend/vocode react sdk

Kian1354 · 2023-05-24T21:29:58Z

apps/telegram_bot/Dockerfile

+
+WORKDIR /code
+COPY ./requirements.txt /code/requirements.txt
+RUN pip install --no-cache-dir --upgrade -r requirements.txt


let's use poetry for everything

Kian1354 · 2023-05-24T21:30:54Z

apps/telegram_bot/main.py

+from vocode.turn_based.synthesizer.stream_elements_synthesizer import (
+    StreamElementsSynthesizer,
+)
+from vocode.turn_based.synthesizer.eleven_labs_synthesizer import ElevenLabsSynthesizer


nit but i think you can actually import all of these together since we have an init file now

Kian1354 · 2023-05-24T21:31:57Z

apps/telegram_bot/main.py

+SYNTH = CoquiSynthesizer()
+
+# Array of tuples (synthesizer's voice id, nickname, description if text to voice)
+DEFAULT_VOICES = [("d2bd7ccb-1b65-4005-9578-32c4e02d8ddf", "Coqui Default", None)]


what's the first value here? wondering if it should it be hard coded

Kian1354 · 2023-05-24T21:34:36Z

apps/telegram_bot/main.py

+
+
+class VocodeBotResponder:
+    def __init__(self, transcriber, system_prompt, synthesizer, db=None):


nit: can we type these (and the rest of the pr)? should run mypy

Kian1354 · 2023-05-24T22:17:30Z

apps/telegram_bot/main.py

+        self, update: Update, context: ContextTypes.DEFAULT_TYPE
+    ):
+        chat_id = update.effective_chat.id
+        _, name, description = self.get_current_voice(chat_id)


based on above, should just be name

Kian1354 · 2023-05-24T22:17:58Z

apps/telegram_bot/main.py

+- Use /help to see this help message again.
+"""
+        if type(self.synthesizer) is CoquiSynthesizer:
+            help_text += "\n- Use /create <voice_description> to create a new Coqui TTS voice from a text prompt and switch to it."


nit +=

can use f""

Kian1354 · 2023-05-24T22:20:07Z

vocode/turn_based/synthesizer/coqui_synthesizer.py


 COQUI_BASE_URL = "https://app.coqui.ai/api/v2/"
 DEFAULT_SPEAKER_ID = "d2bd7ccb-1b65-4005-9578-32c4e02d8ddf"
+MAX_TEXT_LENGTH = 250  # The maximum length of text that can be synthesized at once


let's include this as a parameter? and default to 250

Kian1354 · 2023-05-24T22:29:08Z

vocode/turn_based/synthesizer/coqui_synthesizer.py

        response = requests.post(url, headers=headers, json=body)
        assert response.ok, response.text
        sample = response.json()
        response = requests.get(sample["audio_url"])
        return AudioSegment.from_wav(io.BytesIO(response.content))  # type: ignore
+
+    def split_text(self, text: str) -> List[str]:


idea: this implementation may be easier to read? via gpt4

def split_text(self, text: str) -> List[str]: sentence_enders = re.compile('[.!?]') sentences = sentence_enders.split(text) chunks = [] current_chunk = "" for sentence in sentences: sentence = sentence.strip() if not sentence: continue proposed_chunk = current_chunk + sentence if len(proposed_chunk) > 250: chunks.append(current_chunk.strip()) current_chunk = sentence + "." else: current_chunk = proposed_chunk + "." if current_chunk: chunks.append(current_chunk.strip()) return chunks

Kian1354 · 2023-05-24T22:33:10Z

apps/telegram_bot/main.py

+    CoquiTTSSynthesizer: "speaker",
+    RimeSynthesizer: "speaker",
+}
+assert set(voice_attr_of.keys()) == set(


we can remove these asserts

If I don't have these I get type errors.

Kian1354 · 2023-05-27T19:40:53Z

vocode/turn_based/synthesizer/coqui_synthesizer.py

        api_key: Optional[str] = None,
    ):
        self.voice_id = voice_id or DEFAULT_SPEAKER_ID
        self.voice_prompt = voice_prompt
+        self.xtts = xtts


nit: maybe something more descriptive like enable_xtts

zaptrem · 2023-05-30T18:32:09Z

@Kian1354 replaced by #172 and #173

zaptrem added 3 commits May 23, 2023 23:20

Coqui: support prompt-to-voice & >250 char input

3087515

v0.1.110

841cade

add coqui telegram bot demo

574ffe0

zaptrem requested a review from ajar98 May 24, 2023 07:00

cleanliness

1988b66

zaptrem removed the request for review from ajar98 May 24, 2023 07:06

handle linter's incorrect understanding of sum

6f593a9

Kian1354 reviewed May 24, 2023

View reviewed changes

zaptrem added 5 commits May 26, 2023 12:06

add xtts, fix voice ids, fix /who, add poetry

00705c4

fix agent memory and character prompting

e770386

make split_text easier to read

1754f54

use poetry with telegram-bot

6bb4a1f

typing, formatting, persistence fixes for telegram

863ed2a

Kian1354 reviewed May 27, 2023

View reviewed changes

make xtts option more descriptive

404a8c0

zaptrem closed this May 30, 2023

zaptrem reopened this May 30, 2023

zaptrem closed this May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telegram Bot and Coqui Improvments #144

Telegram Bot and Coqui Improvments #144

zaptrem commented May 24, 2023 •

edited

Loading

Kian1354 left a comment

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 24, 2023

Kian1354 May 24, 2023

Kian1354 May 24, 2023

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 24, 2023

zaptrem May 27, 2023

Kian1354 May 27, 2023

zaptrem May 29, 2023

zaptrem commented May 30, 2023



		class VocodeBotResponder:
		def __init__(self, transcriber, system_prompt, synthesizer, db=None):

Telegram Bot and Coqui Improvments #144

Telegram Bot and Coqui Improvments #144

Conversation

zaptrem commented May 24, 2023 • edited Loading

1. Coqui Synthesizer Changes

2. Prompt-To-Voice Telegram Bot

TODO:

Kian1354 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaptrem commented May 30, 2023

zaptrem commented May 24, 2023 •

edited

Loading