Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backmerge master to develop #287

Merged
merged 8 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion bolna/agent_manager/task_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -1617,7 +1617,9 @@ async def run(self):

if self.background_check_task is not None:
self.background_check_task.cancel()


if self.first_message_task is not None:
self.first_message_task.cancel()

if self.should_record:
output['recording_url'] = await save_audio_file_to_s3(self.conversation_recording, self.sampling_rate, self.assistant_id, self.run_id)
Expand Down
6 changes: 4 additions & 2 deletions bolna/models.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import json
from typing import Optional, List, Union, Dict
from pydantic import BaseModel, Field, validator, ValidationError, Json
from pydantic_core import PydanticCustomError
from .providers import *

AGENT_WELCOME_MESSAGE = "This call is being recorded for quality assurance and training. Please speak now."


def validate_attribute(value, allowed_values):
if value not in allowed_values:
raise ValidationError(f"Invalid provider. Supported values: {', '.join(allowed_values)}")
raise ValidationError(f"Invalid provider {value}. Supported values: {allowed_values}")
return value


Expand Down Expand Up @@ -80,7 +81,8 @@ class Transcriber(BaseModel):

@validator("provider")
def validate_model(cls, value):
return validate_attribute(value, list(SUPPORTED_TRANSCRIBER_MODELS.keys()))
print(f"value {value}, PROVIDERS {list(SUPPORTED_TRANSCRIBER_PROVIDERS.keys())}")
return validate_attribute(value, list(SUPPORTED_TRANSCRIBER_PROVIDERS.keys()))

@validator("language")
def validate_language(cls, value):
Expand Down
4 changes: 3 additions & 1 deletion bolna/providers.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from bolna.transcriber.bodhi_transcriber import BodhiTranscriber
from .synthesizer import PollySynthesizer, XTTSSynthesizer, ElevenlabsSynthesizer, OPENAISynthesizer, FourieSynthesizer, DeepgramSynthesizer, MeloSynthesizer, StylettsSynthesizer
from .transcriber import DeepgramTranscriber, WhisperTranscriber
from .input_handlers import DefaultInputHandler, TwilioInputHandler, ExotelInputHandler, PlivoInputHandler, DailyInputHandler
Expand All @@ -17,7 +18,8 @@

SUPPORTED_TRANSCRIBER_PROVIDERS = {
'deepgram': DeepgramTranscriber,
'whisper': WhisperTranscriber
'whisper': WhisperTranscriber,
'bodhi': BodhiTranscriber
}

#Backwards compatibility
Expand Down
317 changes: 317 additions & 0 deletions bolna/transcriber/bodhi_transcriber.py

Large diffs are not rendered by default.

10 changes: 6 additions & 4 deletions bolna/transcriber/whisper_transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,10 +136,12 @@ async def sender_stream(self, ws=None):
self.num_frames += 1

audio_chunk:bytes = ws_data_packet.get('data')
# ulaw is encoding method , this is the inverse function
audio_chunk = ulaw2lin(audio_chunk, 2)
# convert from 8000 to 16000 HZ
audio_chunk = ratecv(audio_chunk, 2, 1, 8000, 16000, None)[0]

if self.provider in ["twilio", "exotel", "plivo"]:
logger.info(f"It is a telephony provider")
audio_chunk = ulaw2lin(audio_chunk, 2)
audio_chunk = ratecv(audio_chunk, 2, 1, 8000, 16000, None)[0]

audio_chunk = self.bytes_to_float_array(audio_chunk).tobytes()
# save the audio cursor here
self.audio_cursor = self.num_frames * self.audio_frame_duration
Expand Down
21 changes: 21 additions & 0 deletions examples/whisper-melo-llama3/.env-sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
TWILIO_PHONE_NUMBER=

DEEPGRAM_AUTH_TOKEN=
DEEPGRAM_API_KEY=

ELEVENLABS_API_KEY=

OPENAI_API_KEY=
OPENAI_MODEL=gpt-3.5-turbo

ENVIRONMENT=local
WEBSOCKET_URL=
APP_CALLBACK_URL=

REDIS_URL=redis://redis:6379

WHISPER_URL=ws://whisper-app:9000

MELO_TTS=http://melo-app:8000/connection
156 changes: 156 additions & 0 deletions examples/whisper-melo-llama3/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Bolna With MeloTTS and WhisperASR
Introducing our Dockerized solution! Seamlessly merge [Bolna](https://github.com/bolna-ai/bolna) with [Whisper ASR](https://github.com/bolna-ai/streaming-whisper-server) and [Melo TTS](https://github.com/anshjoseph/MiloTTS-Server) for telephone provider we use Twillo and for tunning we use ngrok. This is docker compose by which you can host bolna server Whisper ASR, Melo TTS together in cloud just by clone this repo and follow these simple steps to deploy ,but before that you have to make sure that you have [docker](https://docs.docker.com/engine/install/) and [docker compose](https://docs.docker.com/compose/install/) and make a .env file refer to .env-sample and also put ngrok auth token in ngrok-config.yml file


### Start Serices
```shell
docker compose up -d
```
the output something like this
![alt text](./img/docker_up.png "docker compose up -d")

note: make sure that your all service were runing

`let assume your server IP is 192.168.1.10`

### Creating Agent
for creating agent you have to execute following command mention below
```shell
curl --location 'http://192.168.1.10:5001/agent' \
--header 'Content-Type: application/json' \
--data '{
"agent_config": {
"agent_name": "Alfred",
"agent_type": "other",
"tasks": [
{
"task_type": "conversation",
"tools_config": {
"llm_agent": {
"model": "deepinfra/meta-llama/Meta-Llama-3-70B-Instruct",
"max_tokens": 123,
"agent_flow_type": "streaming",
"use_fallback": true,
"family": "llama",
"temperature": 0.1,
"request_json": true,
"provider":"deepinfra"
},
"synthesizer": {
"provider": "melotts",
"provider_config": {
"voice": "Casey",
"sample_rate": 8000,
"sdp_ratio" : 0.2,
"noise_scale" : 0.6,
"noise_scale_w" : 0.8,
"speed" : 1.0
},
"stream": true,
"buffer_size": 123,
"audio_format": "wav"
},
"transcriber": {
"encoding": "linear16",
"language": "en",
"model": "whisper",
"stream": true,
"task": "transcribe"
},
"input": {
"provider": "twilio",
"format": "wav"
},
"output": {
"provider": "twilio",
"format": "wav"
}
},
"toolchain": {
"execution": "parallel",
"pipelines": [
[
"transcriber",
"llm",
"synthesizer"
]
]
}
}
]
},
"agent_prompts": {
"task_1": {
"system_prompt": "What is the Ultimate Question of Life, the Universe, and Everything?"
}
}
}'

```
below given is the response
![alt text](./img/agent_res.png "agent response")
copy the agent_id we have to use in next step

if you want to [Change voice](#change-voice)

### Make call
```shell
curl --location 'http://192.168.1.10:8001/call' \
--header 'Content-Type: application/json' \
--data '{
"agent_id": "bf2a9e9c-6038-4104-85c4-b71a0d1478c9",
"recipient_phone_number": "+91XXXXXXXXXX"
}'
```
it gonna give output `Done` for succees

note: if you are using trial account use you register phone no

### Stop Services
```shell
docker compose down
```
![alt text](./img/docker_dw.png "docker compose up -d")


### Changing the voice MeloTTS
<a id="change-voice"></a>
by default we resrtict Melo EN but there were 5 option for voice as mention below
- ['EN-US'](./audio/audio_sample/EN_US.wav)
- ['EN-BR'](./audio/audio_sample/EN-BR.wav)
- ['EN-AU'](./audio/audio_sample/EN-AU.wav)
- ['EN-Default'](./audio/audio_sample/EN-Default.wav)
- ['EN_INDIA'](./audio/audio_sample/EN_INDIA.wav)

you have to just change the following section mention below
```JSON
"synthesizer": {
"provider": "melo",
"provider_config": {
"voice": "<put your selected voice here>",
"sample_rate": 8000,
"sdp_ratio" : 0.2,
"noise_scale" : 0.6,
"noise_scale_w" : 0.8,
"speed" : 1.0
},
"stream": true,
"buffer_size": 123,
"audio_format": "pcm"
}
```
and rest of the config gonna be same mention above

### Conservation DENO
This is demo using below prompt to the LLM
```json
"task_1": {
"system_prompt": "You are assistant at Dr. Sharma clinic you have to book an appointment"
}
```



[chat GPT 3.5 turbo 16k demo](./audio/demo_audio.mp3)

you can give prompt as per your use case
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
83 changes: 83 additions & 0 deletions examples/whisper-melo-llama3/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
services:

# main bolna service
bolna-app:
image: bolna-app:latest
build:
context: .
dockerfile: dockerfiles/bolna_server.Dockerfile
ports:
- "5001:5001"
depends_on:
- redis
env_file:
- .env
volumes:
- ../agent_data:/app/agent_data
- $HOME/.aws/credentials:/root/.aws/credentials:ro
- $HOME/.aws/config:/root/.aws/config:ro

# redis service used as a persistent storage
redis:
image: redis:latest
ports:
- "6379:6379"

# ngrok for local tunneling
ngrok:
image: ngrok/ngrok:latest
restart: unless-stopped
command:
- "start"
- "--all"
- "--config"
- "/etc/ngrok.yml"
volumes:
- ./ngrok-config.yml:/etc/ngrok.yml
ports:
- 4040:4040

### Telephony servers ###
twilio-app:
image: twilio-app:latest
build:
context: .
dockerfile: dockerfiles/twilio_server.Dockerfile
ports:
- "8001:8001"
depends_on:
- redis
env_file:
- .env

### whisper servers ###
whisper-app:
image: whisper-app:latest
build:
context: .
dockerfile: dockerfiles/whisper_server.Dockerfile
ports:
- "9002:9000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
### Melo TTS ###
melo-app:
image: melo-app:latest
build:
context: .
dockerfile: dockerfiles/melo_server.Dockerfile
ports:
- "8002:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

19 changes: 19 additions & 0 deletions examples/whisper-melo-llama3/dockerfiles/bolna_server.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM python:3.10.13-slim

WORKDIR /app
COPY ./requirements.txt /app
COPY ./quickstart_server.py /app

RUN apt-get update && apt-get install libgomp1 git -y
RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
RUN pip install -r requirements.txt
RUN pip install --force-reinstall git+https://github.com/bolna-ai/bolna@MeloTTS
RUN pip install scipy==1.11.0
RUN pip install torch==2.0.1
RUN pip install torchaudio==2.0.1
RUN pip install pydub==0.25.1
RUN pip install ffprobe
RUN pip install aiofiles

EXPOSE 5001
CMD ["uvicorn", "quickstart_server:app", "--host", "0.0.0.0", "--port", "5001"]
13 changes: 13 additions & 0 deletions examples/whisper-melo-llama3/dockerfiles/melo_server.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM python:3.10.13-slim
WORKDIR /app

RUN apt-get update && apt-get install libgomp1 git -y
RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
RUN git clone https://github.com/bolna-ai/MeloTTS
RUN pip install fastapi uvicorn torchaudio
RUN cp -a MeloTTS/. .
RUN python -m pip cache purge
RUN pip install --no-cache-dir txtsplit torch torchaudio cached_path transformers==4.27.4 mecab-python3==1.0.5 num2words==0.5.12 unidic_lite unidic mecab-python3==1.0.5 pykakasi==2.2.1 fugashi==1.3.0 g2p_en==2.1.0 anyascii==0.3.2 jamo==0.4.1 gruut[de,es,fr]==2.2.3 g2pkk>=0.1.1 librosa==0.9.1 pydub==0.25.1 eng_to_ipa==0.0.2 inflect==7.0.0 unidecode==1.3.7 pypinyin==0.50.0 cn2an==0.5.22 jieba==0.42.1 langid==1.1.6 tqdm tensorboard==2.16.2 loguru==0.7.2
RUN python -m unidic download
EXPOSE 8000
CMD ["python3", "Server.py"]
11 changes: 11 additions & 0 deletions examples/whisper-melo-llama3/dockerfiles/twilio_server.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM python:3.10.13-slim

WORKDIR /app
COPY ./requirements.txt /app
COPY ./telephony_server/twilio_api_server.py /app

RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 8001

CMD ["uvicorn", "twilio_api_server:app", "--host", "0.0.0.0", "--port", "8001"]
16 changes: 16 additions & 0 deletions examples/whisper-melo-llama3/dockerfiles/whisper_server.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM python:3.10.13-slim

RUN apt-get update && apt-get install libgomp1 git -y
RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
RUN apt-get -y install build-essential
RUN apt-get -y install portaudio19-dev
RUN git clone https://github.com/bolna-ai/streaming-whisper-server.git
WORKDIR streaming-whisper-server
RUN pip install -e .
RUN pip install git+https://github.com/SYSTRAN/faster-whisper.git
RUN pip install transformers

RUN ct2-transformers-converter --model openai/whisper-small --copy_files preprocessor_config.json --output_dir ./Server/ASR/whisper_small --quantization float16
WORKDIR Server
EXPOSE 9000
CMD ["python3", "Server.py", "-p", "9000"]
Binary file added examples/whisper-melo-llama3/img/agent_res.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/whisper-melo-llama3/img/docker_dw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/whisper-melo-llama3/img/docker_up.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading