Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coqui Telegram Bot #173

Merged
merged 19 commits into from
Jun 15, 2023
Merged

Coqui Telegram Bot #173

merged 19 commits into from
Jun 15, 2023

Conversation

zaptrem
Copy link
Contributor

@zaptrem zaptrem commented May 30, 2023

Prompt-To-Voice Telegram Bot

A demonstration of how Coqui TTS can be integrated with Vocode and telegram to create engaging voice applications.

The bot (based on albertwujj's work) uses the python-telegram-bot library to handle user messages and commands, the WhisperTranscriber class to transcribe voice messages from users, and the ChatGPTAgent class to generate text responses based on a system prompt and the user input. The system prompt is even customized based on the voice name and description of the current voice. The bot also allows the user to select or create different voices using Coqui TTS's voice creation APIs.

The bot supports the following commands for the user to interact with it:

  • /start: Initializes the user data and sends a welcome message.
  • /voice <voice_id>: Changes the current voice to the one with the given id and resets the conversation. The voice id must be an integer corresponding to one of the available voices.
  • /create <voice_description>: Creates a new Coqui TTS voice from a text prompt and switches to it. The voice description must be a string that describes how the voice should sound like.
  • /list: Lists all the available voices with their ids, names, and descriptions (if any).
  • /who: Shows the name and description (if any) of the current voice.
  • /help: Shows a help message with all the available commands.

TODO:

  • [🚧] Add voice cloning from audio clip feature using Coqui TTS's clone endpoint.
  • Make the bot work on Replit including use of ReplitDB for conversation/voice persistence between instances.
  • Get feedback on code cleanliness and readability.
  • Test the changes to CoquiSynthesizer more thoroughly and handle possible errors or edge cases.
  • Evaluate if the InMemoryDB wrapper is the best way to handle non-existent users or if there is a better alternative.
  • Implement Coqui improvements/fix in streaming synthesizer or create separate issue.
  • Investigate issue on Coqui's side relating to dropped sentences and/or switch to their new (but slower) model.

@zaptrem zaptrem mentioned this pull request May 30, 2023
7 tasks
@zaptrem zaptrem changed the title Zaptrem/coqui telegram bot Coqui Telegram Bot May 30, 2023
@@ -0,0 +1,36 @@
# client_backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for later - and let's add this to linear, this will need to be in docs/

# Return an AudioSegment object from the audio data
return AudioSegment.from_wav(io.BytesIO(audio_data)) # type: ignore

def get_request(self, text: str) -> tuple[str, dict[str, str], dict[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd guess this line is what is breaking mypy, you'll need to use typing.Tuple instead of tuple and typing.Dict instead of dict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for compatibility with python 3.8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in other PR

continue
# Concatenate the current chunk and the sentence, and add a period to the end
proposed_chunk = current_chunk + sentence
if len(proposed_chunk) > 250:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: magic number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in other PR

proposed_chunk = current_chunk + sentence
if len(proposed_chunk) > 250:
chunks.append(current_chunk.strip())
current_chunk = sentence + "."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as kian said before, this will need to preserve the correct sentence ending that it was split on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in other PR

# Create an aiohttp session and post the request asynchronously using await
async with aiohttp.ClientSession() as session:
async with session.post(url, headers=headers, json=body) as response:
assert response.status == 201, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now, but this sort of assert that is "expected" (not something that should never happen) - should probably be some other error class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added linear followup

) -> None:
assert update.effective_chat, "Chat must be defined!"
chat_id = update.effective_chat.id
if type(self.synthesizer) is not CoquiSynthesizer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit isinstance instead of CoquiSynthesizer - in case in the future we subclass CoquiSynthesizer, for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if type(self.synthesizer) is not CoquiSynthesizer:
await context.bot.send_message(
chat_id=chat_id,
text="Sorry, voice creation is only supported for Coqui TTS.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
text="Sorry, voice creation is only supported for Coqui TTS.",
text="Sorry, voice creation is only supported for Coqui.",

since we have a "Coqui TTS" synthesizer which is their OSS thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see the change

def get_agent(self, chat_id: int) -> ChatGPTAgent:
# Get current voice name and description from DB
_, voice_name, voice_description = self.db[chat_id].get(
"current_voice", (None, None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can turn this tuple into a pydantic class:

class Voice(pydantic.BaseModel):
  voice_id...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

chat_id = update.effective_chat.id
user_voices = self.db[chat_id]["voices"] # array (id, name, description)
# Make string table of id, name, description
voices = "\n".join(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
voices = "\n".join(
voices_formatted = "\n".join(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Use /help to see this help message again.
"""
assert update.effective_chat, "Chat must be defined!"
if type(self.synthesizer) is CoquiSynthesizer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit isinstance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zaptrem zaptrem requested a review from ajar98 June 11, 2023 15:38
Copy link
Contributor

@ajar98 ajar98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! nice work

) -> None:
self.transcriber = transcriber
self.system_prompt = system_prompt
self.synthesizer = synthesizer
self.db = ChatsDB(db if db else {})
self.db: Dict[int, Chat] = defaultdict(Chat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, very clean

# Initialize an empty dictionary to store user data
self.db = db
# Define a Voice model with id, name and description fields
class Voice(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if type(self.synthesizer) is not CoquiSynthesizer:
await context.bot.send_message(
chat_id=chat_id,
text="Sorry, voice creation is only supported for Coqui TTS.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see the change

@ajar98 ajar98 assigned zaptrem and unassigned ajar98 Jun 13, 2023
@zaptrem
Copy link
Contributor Author

zaptrem commented Jun 15, 2023

@ajar98 Fixed the branding issue and merged again. Good to go from my end.

@zaptrem zaptrem merged commit 258876b into main Jun 15, 2023
m5a0r7 pushed a commit to m5a0r7/vocode-python that referenced this pull request Oct 19, 2023
* Add async synthesize, xtts, and prompt to coqui TB

* add speechrecognition and aiohttp dependencies

* add optional memory arg to turn-based ChatGPTAgent

* add coqui telegram bot

* fix mypy issue

* pr feedback

* Rename defaultdict

* fix py3.8 typing issue

* another py3.8 fix

* [broken] pydantic progress

* use pydantic and defaultdict

* more nit fixes

* Fix Coqui branding

* fix type error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants