Merge pull request #285 from bolna-ai/feat/boodhi-integration

add an example to use fully open source stack
voxos-ai · Jun 24, 2024 · 594e989 · 594e989
2 parents 72b0356 + 3e91c50
commit 594e989
Show file tree

Hide file tree

Showing 20 changed files with 524 additions and 0 deletions.
diff --git a/examples/whisper-melo-llama3/.env-sample b/examples/whisper-melo-llama3/.env-sample
@@ -0,0 +1,21 @@
+TWILIO_ACCOUNT_SID=
+TWILIO_AUTH_TOKEN=
+TWILIO_PHONE_NUMBER=
+
+DEEPGRAM_AUTH_TOKEN=
+DEEPGRAM_API_KEY=
+
+ELEVENLABS_API_KEY=
+
+OPENAI_API_KEY=
+OPENAI_MODEL=gpt-3.5-turbo
+
+ENVIRONMENT=local
+WEBSOCKET_URL=
+APP_CALLBACK_URL=
+
+REDIS_URL=redis://redis:6379
+
+WHISPER_URL=ws://whisper-app:9000
+
+MELO_TTS=http://melo-app:8000/connection
diff --git a/examples/whisper-melo-llama3/Readme.md b/examples/whisper-melo-llama3/Readme.md
@@ -0,0 +1,156 @@
+# Bolna With MeloTTS and WhisperASR
+Introducing our Dockerized solution! Seamlessly merge [Bolna](https://github.com/bolna-ai/bolna) with [Whisper ASR](https://github.com/bolna-ai/streaming-whisper-server) and [Melo TTS](https://github.com/anshjoseph/MiloTTS-Server) for telephone provider we use Twillo and for tunning we use ngrok. This is docker compose by which you can host bolna server Whisper ASR, Melo TTS together in cloud just by clone this repo  and follow these simple steps to deploy ,but before that you have to make sure that you have [docker](https://docs.docker.com/engine/install/) and [docker compose](https://docs.docker.com/compose/install/) and make a .env file refer to .env-sample and also put ngrok auth token in ngrok-config.yml file
+
+
+### Start Serices
+```shell
+docker compose up -d
+```
+the output something like this
+![alt text](./img/docker_up.png "docker compose up -d")
+
+note: make sure that your all service were runing
+
+`let assume your server IP is 192.168.1.10`
+
+### Creating Agent
+for creating agent you have to execute following command mention below
+```shell
+curl --location 'http://192.168.1.10:5001/agent' \
+--header 'Content-Type: application/json' \
+--data '{
+  "agent_config": {
+    "agent_name": "Alfred",
+    "agent_type": "other",
+    "tasks": [
+      {
+        "task_type": "conversation",
+        "tools_config": {
+          "llm_agent": {
+            "model": "deepinfra/meta-llama/Meta-Llama-3-70B-Instruct",
+            "max_tokens": 123,
+            "agent_flow_type": "streaming",
+            "use_fallback": true,
+            "family": "llama",
+            "temperature": 0.1,
+            "request_json": true,
+            "provider":"deepinfra"
+          },
+          "synthesizer": {
+            "provider": "melotts",
+            "provider_config": {
+              "voice": "Casey",
+              "sample_rate": 8000,
+              "sdp_ratio" : 0.2,
+              "noise_scale" : 0.6,
+              "noise_scale_w" :  0.8,
+              "speed" : 1.0
+            },
+            "stream": true,
+            "buffer_size": 123,
+            "audio_format": "wav"
+          },
+          "transcriber": {
+            "encoding": "linear16",
+            "language": "en",
+            "model": "whisper",
+            "stream": true,
+            "task": "transcribe"
+          },
+          "input": {
+            "provider": "twilio",
+            "format": "wav"
+          },
+          "output": {
+            "provider": "twilio",
+            "format": "wav"
+          }
+        },
+        "toolchain": {
+          "execution": "parallel",
+          "pipelines": [
+            [
+              "transcriber",
+              "llm",
+              "synthesizer"
+            ]
+          ]
+        }
+      }
+    ]
+  },
+  "agent_prompts": {
+    "task_1": {
+      "system_prompt": "What is the Ultimate Question of Life, the Universe, and Everything?"
+    }
+  }
+}'
+
+```
+below given is the response 
+![alt text](./img/agent_res.png "agent response")
+copy the agent_id we have to use in next step
+
+if you want to [Change voice](#change-voice)
+
+### Make call
+```shell
+curl --location 'http://192.168.1.10:8001/call' \
+--header 'Content-Type: application/json' \
+--data '{
+    "agent_id": "bf2a9e9c-6038-4104-85c4-b71a0d1478c9",
+    "recipient_phone_number": "+91XXXXXXXXXX"
+}'
+```
+it gonna give output `Done` for succees
+
+note: if you are using trial account use you register phone no
+
+### Stop Services
+```shell
+docker compose down
+```
+![alt text](./img/docker_dw.png "docker compose up -d")
+
+
+### Changing the voice MeloTTS
+<a id="change-voice"></a>
+by default we resrtict Melo EN but there were 5 option for voice as mention below
+- ['EN-US'](./audio/audio_sample/EN_US.wav) 
+- ['EN-BR'](./audio/audio_sample/EN-BR.wav) 
+- ['EN-AU'](./audio/audio_sample/EN-AU.wav) 
+- ['EN-Default'](./audio/audio_sample/EN-Default.wav) 
+- ['EN_INDIA'](./audio/audio_sample/EN_INDIA.wav)
+
+you have to just change the following section mention below
+```JSON
+"synthesizer": {
+            "provider": "melo",
+            "provider_config": {
+              "voice": "<put your selected voice here>",
+              "sample_rate": 8000,
+              "sdp_ratio" : 0.2,
+              "noise_scale" : 0.6,
+              "noise_scale_w" :  0.8,
+              "speed" : 1.0
+            },
+            "stream": true,
+            "buffer_size": 123,
+            "audio_format": "pcm"
+          }
+```
+and rest of the config gonna be same mention above
+
+### Conservation DENO
+This is demo using below prompt to the LLM
+```json
+"task_1": {
+      "system_prompt": "You are assistant at Dr. Sharma clinic you have to book an appointment"
+}
+```
+
+
+
+[chat GPT 3.5 turbo 16k demo](./audio/demo_audio.mp3)
+
+you can give prompt as per your use case 
diff --git a/examples/whisper-melo-llama3/audio/audio_sample/EN-AU.wav b/examples/whisper-melo-llama3/audio/audio_sample/EN-AU.wav
diff --git a/examples/whisper-melo-llama3/audio/audio_sample/EN-BR.wav b/examples/whisper-melo-llama3/audio/audio_sample/EN-BR.wav
diff --git a/examples/whisper-melo-llama3/audio/audio_sample/EN-Default.wav b/examples/whisper-melo-llama3/audio/audio_sample/EN-Default.wav
diff --git a/examples/whisper-melo-llama3/audio/audio_sample/EN_INDIA.wav b/examples/whisper-melo-llama3/audio/audio_sample/EN_INDIA.wav
diff --git a/examples/whisper-melo-llama3/audio/audio_sample/EN_US.wav b/examples/whisper-melo-llama3/audio/audio_sample/EN_US.wav
diff --git a/examples/whisper-melo-llama3/audio/demo_audio.mp3 b/examples/whisper-melo-llama3/audio/demo_audio.mp3
diff --git a/examples/whisper-melo-llama3/docker-compose.yml b/examples/whisper-melo-llama3/docker-compose.yml
@@ -0,0 +1,83 @@
+services:
+
+  # main bolna service
+  bolna-app:
+    image: bolna-app:latest
+    build:
+      context: .
+      dockerfile: dockerfiles/bolna_server.Dockerfile
+    ports:
+      - "5001:5001"
+    depends_on:
+      - redis
+    env_file:
+        - .env
+    volumes:
+      - ../agent_data:/app/agent_data
+      - $HOME/.aws/credentials:/root/.aws/credentials:ro
+      - $HOME/.aws/config:/root/.aws/config:ro
+
+  # redis service used as a persistent storage
+  redis:
+    image: redis:latest
+    ports:
+      - "6379:6379"
+
+  # ngrok for local tunneling
+  ngrok:
+    image: ngrok/ngrok:latest
+    restart: unless-stopped
+    command:
+      - "start"
+      - "--all"
+      - "--config"
+      - "/etc/ngrok.yml"
+    volumes:
+      - ./ngrok-config.yml:/etc/ngrok.yml
+    ports:
+      - 4040:4040
+
+  ### Telephony servers ###
+  twilio-app:
+    image: twilio-app:latest
+    build:
+      context: .
+      dockerfile: dockerfiles/twilio_server.Dockerfile
+    ports:
+      - "8001:8001"
+    depends_on:
+      - redis
+    env_file:
+      - .env
+
+  ### whisper servers ###
+  whisper-app:
+    image: whisper-app:latest
+    build:
+      context: .
+      dockerfile: dockerfiles/whisper_server.Dockerfile
+    ports:
+      - "9002:9000"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+  ### Melo TTS ###
+  melo-app:
+    image: melo-app:latest
+    build:
+      context: .
+      dockerfile: dockerfiles/melo_server.Dockerfile
+    ports:
+      - "8002:8000"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+
diff --git a/examples/whisper-melo-llama3/dockerfiles/bolna_server.Dockerfile b/examples/whisper-melo-llama3/dockerfiles/bolna_server.Dockerfile
@@ -0,0 +1,19 @@
+FROM python:3.10.13-slim
+
+WORKDIR /app
+COPY ./requirements.txt /app
+COPY ./quickstart_server.py /app
+
+RUN apt-get update && apt-get install libgomp1 git -y
+RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
+RUN pip install -r requirements.txt
+RUN pip install --force-reinstall git+https://github.com/bolna-ai/bolna@MeloTTS
+RUN pip install scipy==1.11.0
+RUN pip install torch==2.0.1
+RUN pip install torchaudio==2.0.1
+RUN pip install pydub==0.25.1
+RUN pip install ffprobe
+RUN pip install aiofiles
+
+EXPOSE 5001
+CMD ["uvicorn", "quickstart_server:app", "--host", "0.0.0.0", "--port", "5001"]
diff --git a/examples/whisper-melo-llama3/dockerfiles/melo_server.Dockerfile b/examples/whisper-melo-llama3/dockerfiles/melo_server.Dockerfile
@@ -0,0 +1,13 @@
+FROM python:3.10.13-slim
+WORKDIR /app
+
+RUN apt-get update && apt-get install libgomp1 git -y
+RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
+RUN git clone https://github.com/bolna-ai/MeloTTS
+RUN pip install fastapi uvicorn torchaudio
+RUN cp -a MeloTTS/. . 
+RUN python -m pip cache purge
+RUN pip install --no-cache-dir txtsplit torch torchaudio cached_path transformers==4.27.4 mecab-python3==1.0.5 num2words==0.5.12 unidic_lite unidic mecab-python3==1.0.5 pykakasi==2.2.1 fugashi==1.3.0 g2p_en==2.1.0 anyascii==0.3.2 jamo==0.4.1 gruut[de,es,fr]==2.2.3 g2pkk>=0.1.1 librosa==0.9.1 pydub==0.25.1 eng_to_ipa==0.0.2 inflect==7.0.0 unidecode==1.3.7 pypinyin==0.50.0 cn2an==0.5.22 jieba==0.42.1 langid==1.1.6 tqdm tensorboard==2.16.2 loguru==0.7.2
+RUN python -m unidic download
+EXPOSE 8000
+CMD ["python3", "Server.py"]
diff --git a/examples/whisper-melo-llama3/dockerfiles/twilio_server.Dockerfile b/examples/whisper-melo-llama3/dockerfiles/twilio_server.Dockerfile
@@ -0,0 +1,11 @@
+FROM python:3.10.13-slim
+
+WORKDIR /app
+COPY ./requirements.txt /app
+COPY ./telephony_server/twilio_api_server.py /app
+
+RUN pip install --no-cache-dir -r requirements.txt
+
+EXPOSE 8001
+
+CMD ["uvicorn", "twilio_api_server:app", "--host", "0.0.0.0", "--port", "8001"]
diff --git a/examples/whisper-melo-llama3/dockerfiles/whisper_server.Dockerfile b/examples/whisper-melo-llama3/dockerfiles/whisper_server.Dockerfile
@@ -0,0 +1,16 @@
+FROM python:3.10.13-slim
+
+RUN apt-get update && apt-get install libgomp1 git -y
+RUN apt-get -y update && apt-get -y upgrade && apt-get install -y --no-install-recommends ffmpeg
+RUN apt-get -y install build-essential
+RUN apt-get -y install portaudio19-dev
+RUN git clone https://github.com/bolna-ai/streaming-whisper-server.git
+WORKDIR streaming-whisper-server
+RUN pip install -e .
+RUN pip install git+https://github.com/SYSTRAN/faster-whisper.git
+RUN pip install transformers
+
+RUN ct2-transformers-converter --model openai/whisper-small --copy_files preprocessor_config.json --output_dir ./Server/ASR/whisper_small --quantization float16
+WORKDIR Server
+EXPOSE 9000
+CMD ["python3", "Server.py", "-p", "9000"]
diff --git a/examples/whisper-melo-llama3/img/agent_res.png b/examples/whisper-melo-llama3/img/agent_res.png
diff --git a/examples/whisper-melo-llama3/img/docker_dw.png b/examples/whisper-melo-llama3/img/docker_dw.png
diff --git a/examples/whisper-melo-llama3/img/docker_up.png b/examples/whisper-melo-llama3/img/docker_up.png
diff --git a/examples/whisper-melo-llama3/ngrok-config.yml b/examples/whisper-melo-llama3/ngrok-config.yml
@@ -0,0 +1,10 @@
+region: us
+version: '2'
+authtoken: <ngrok auth token>
+tunnels:
+  twilio-app:
+    addr: twilio-app:8001
+    proto: http
+  bolna-app:
+    addr: bolna-app:5001
+    proto: http