-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telegram Bot and Coqui Improvments #144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks really good! left minor nits/ideas
@@ -0,0 +1,34 @@ | |||
# client_backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, there are some references to client backend/vocode react sdk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
apps/telegram_bot/Dockerfile
Outdated
|
||
WORKDIR /code | ||
COPY ./requirements.txt /code/requirements.txt | ||
RUN pip install --no-cache-dir --upgrade -r requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use poetry for everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
apps/telegram_bot/main.py
Outdated
from vocode.turn_based.synthesizer.stream_elements_synthesizer import ( | ||
StreamElementsSynthesizer, | ||
) | ||
from vocode.turn_based.synthesizer.eleven_labs_synthesizer import ElevenLabsSynthesizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but i think you can actually import all of these together since we have an init file now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
apps/telegram_bot/main.py
Outdated
SYNTH = CoquiSynthesizer() | ||
|
||
# Array of tuples (synthesizer's voice id, nickname, description if text to voice) | ||
DEFAULT_VOICES = [("d2bd7ccb-1b65-4005-9578-32c4e02d8ddf", "Coqui Default", None)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the first value here? wondering if it should it be hard coded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
apps/telegram_bot/main.py
Outdated
|
||
|
||
class VocodeBotResponder: | ||
def __init__(self, transcriber, system_prompt, synthesizer, db=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we type these (and the rest of the pr)? should run mypy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
apps/telegram_bot/main.py
Outdated
self, update: Update, context: ContextTypes.DEFAULT_TYPE | ||
): | ||
chat_id = update.effective_chat.id | ||
_, name, description = self.get_current_voice(chat_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on above, should just be name
- Use /help to see this help message again. | ||
""" | ||
if type(self.synthesizer) is CoquiSynthesizer: | ||
help_text += "\n- Use /create <voice_description> to create a new Coqui TTS voice from a text prompt and switch to it." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit +=
can use f""
|
||
COQUI_BASE_URL = "https://app.coqui.ai/api/v2/" | ||
DEFAULT_SPEAKER_ID = "d2bd7ccb-1b65-4005-9578-32c4e02d8ddf" | ||
MAX_TEXT_LENGTH = 250 # The maximum length of text that can be synthesized at once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's include this as a parameter? and default to 250
response = requests.post(url, headers=headers, json=body) | ||
assert response.ok, response.text | ||
sample = response.json() | ||
response = requests.get(sample["audio_url"]) | ||
return AudioSegment.from_wav(io.BytesIO(response.content)) # type: ignore | ||
|
||
def split_text(self, text: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: this implementation may be easier to read? via gpt4
def split_text(self, text: str) -> List[str]:
sentence_enders = re.compile('[.!?]')
sentences = sentence_enders.split(text)
chunks = []
current_chunk = ""
for sentence in sentences:
sentence = sentence.strip()
if not sentence:
continue
proposed_chunk = current_chunk + sentence
if len(proposed_chunk) > 250:
chunks.append(current_chunk.strip())
current_chunk = sentence + "."
else:
current_chunk = proposed_chunk + "."
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
CoquiTTSSynthesizer: "speaker", | ||
RimeSynthesizer: "speaker", | ||
} | ||
assert set(voice_attr_of.keys()) == set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove these asserts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't have these I get type errors.
api_key: Optional[str] = None, | ||
): | ||
self.voice_id = voice_id or DEFAULT_SPEAKER_ID | ||
self.voice_prompt = voice_prompt | ||
self.xtts = xtts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe something more descriptive like enable_xtts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Note: This PR description was generated in part (with lots of re-prompting/editing) with Bing-GPT4! It would be neat to automate this.
This pull request introduces a voice-to-voice Telegram bot that shows off Coqui TTS's prompt-to-voice and (soon) audio-to-voice models. The pull request consists of two main parts:
1. Coqui Synthesizer Changes
The
CoquiSynthesizer
class now supports asynchronous parallel synthesis of large audio segments using theasync_synthesize
method, which is added to theBaseSynthesizer
class as well. It adds one new dependency (aiohttp
) to enable non-blocking http calls, and dependency (SpeechRecognition
) that is already in use by the library but wasn't present in poetry.The
async_synthesize
method works as follows:asyncio.create_task()
, which returns a coroutine object that can be awaited later.asyncio.gather()
, which returns a list of results in the same order as the tasks.An example of using the
async_synthesize
method is:2. Prompt-To-Voice Telegram Bot
The bot (based on albertwujj's work) uses the python-telegram-bot library to handle user messages and commands, the
WhisperTranscriber
class to transcribe voice messages from users, and theChatGPTAgent
class to generate text responses based on a system prompt and the user input. The system prompt is even customized based on the voice name and description of the current voice. The bot also allows the user to select or create different voices using Coqui TTS's voice creation APIs.The bot supports the following commands for the user to interact with it:
/start
: Initializes the user data and sends a welcome message./voice <voice_id>
: Changes the current voice to the one with the given id and resets the conversation. The voice id must be an integer corresponding to one of the available voices./create <voice_description>
: Creates a new Coqui TTS voice from a text prompt and switches to it. The voice description must be a string that describes how the voice should sound like./list
: Lists all the available voices with their ids, names, and descriptions (if any)./who
: Shows the name and description (if any) of the current voice./help
: Shows a help message with all the available commands.TODO:
InMemoryDB
wrapper is the best way to handle non-existent users or if there is a better alternative.