feat: adds audio format as an envar (#26)

To control through envar if the administrator wants to use g711 or ulaw. Improved also documentation
felipem1210 · Oct 9, 2024 · fb9cfee · fb9cfee
1 parent b587029
commit fb9cfee
Show file tree

Hide file tree

Showing 10 changed files with 531 additions and 436 deletions.
diff --git a/.env.example b/.env.example
@@ -2,6 +2,7 @@
 ASSISTANT_TOOL=rasa # Define the assistant tool to be used. Options: rasa, anthropic
 STT_TOOL=whisper-local # Define the STT tool to be used. Options: whisper-local, whisper
 SQL_DB_FILE_NAME="freetalkbot.db" # Name of the SQLite database file to be used by the whatsapp bot
+AUDIO_FORMAT=pcm16 # Audio format that will use audiosocket server. Options: pcm16, g711
 
 # Rasa variables. Mandatory if ASSISTANT_TOOL=rasa.
 # Used in rasa implementation and in golang communication channels
@@ -23,5 +24,6 @@ WHISPER_LOCAL_URL=whisper_cpu:8000/v1 # Mandatory if STT_TOOL=whisper-local
 WHISPER__MODEL="deepdml/faster-whisper-large-v3-turbo-ct2" # The whisper model to use. Mandatory if STT_TOOL=whisper-local.
 
 # Optional variables
+G711_AUDIO_CODEC=ulaw # Audio codec to be used in g711 audio format. Options: ulaw, alaw
 #PAIR_PHONE_NUMBER=+1234567890 # Use this variable to allow pair your whatsapp account with a pairing code
 #LOG_LEVEL=DEBUG  # Use this variable to enable debug logs
diff --git a/README.md b/README.md
@@ -1,9 +1,101 @@
 # freetalkbot
 
-Implementation of communication channels to interact with LLM/NLU bot assistants.
+Implementation of VoIP/Whatsapp communication channels to interact with LLM/NLU bot assistants.
 
-* **Voice:** using [Audiosocket Asterisk](https://docs.asterisk.org/Configuration/Channel-Drivers/AudioSocket/) protocol
-* **Whatsapp:** using [whatsmeow](https://github.com/tulir/whatsmeow) library. NO need of Whatsapp Business account, 100% free.
+## VoIP channel
+
+Audiosocket server receiving a request from Asterisk.
+
+### Features:
+
+* Simulates a real conversation, but instead of human you are talking with an assistant.
+* If you don't want to hear more assistant answer you can talk back. The assistant voice will be cut and it will process what you talked.
+* Supports multiple calls (in theory, I haven't had the chance to test this).
+* Fast answer from assistant (Speed is limited by the STT tool transcription generation and assistant answer generation times).
+
+### Architecture
+
+Refer to [architecture-Voicebot.png](docs/architecture-Voicebot.png).
+
+### Asterisk implementation
+
+The request can be implemented in two ways:
+
+1. Using [Audiosocket Dialplan application](https://docs.asterisk.org/Asterisk_20_Documentation/API_Documentation/Dialplan_Applications/AudioSocket/):
+
+```sh
+[dp_entry_call_inout]
+exten = 101,1,Verbose("Call to AudioSocket via Channel interface")
+same = n,Answer()
+same = n,AudioSocket(40325ec2-5efd-4bd3-805f-53576e581d13,<audiosocketserver.address.com>:8080)
+same = n,Hangup()
+```
+
+When using this way, the audio received from asterisk will be signed linear, 16-bit, 8kHz, mono PCM (little-endian). The envar `AUDIO_FORMAT` value must be `pcm16`.
+
+2. Using [Audiosocket Channel driver](https://docs.asterisk.org/Configuration/Channel-Drivers/AudioSocket/)
+
+```sh
+[dp_entry_call_inout]
+exten = 101,1,Verbose("Call to AudioSocket via Channel interface")
+same = n,Answer()
+same = n,Dial(AudioSocket/<audiosocketserver.address.com>:8080/40325ec2-5efd-4bd3-805f-53576e581d13)
+same = n,Hangup()
+```
+
+When using this way, the audio received from asterisk will be use the codec negotiated between the phone and asterisk. By default it is g711, and the audiosocket server can process audio in this codec (both ulaw and alaw.). The envar `AUDIO_FORMAT` value must be `g711` and the envar `G711_AUDIO_CODEC` must be set between `ulaw` or `alaw`.
+If you want to choose a different codec than `g711` you can, both you will have to implement the transformation of the audio data from that codec to `pcm16`. Please refer to [g711.go](packages/audiosocket/g711.go) file. 
+
+### STT
+
+There are two choices. 
+* OpenAI Whisper or 
+* Host [Faster Whisper Server](https://github.com/fedirz/faster-whisper-server). Second choice is recommended if you have GPU power. The advantage of using this server is that the audio is streamed via websocket protocol, which will guarantee more speed in transcription generation.
+
+### TTS
+
+It uses PicoTTS(https://github.com/ihuguet/picotts). The voices used are the ones that comes with pico.
+
+### Languages supported 
+
+They are limited by the languages that PicoTTS supports: en-EN, en-GB, es-ES, de-DE, fr-FR, it-IT
+
+## WhatsApp channel
+
+This implementation was done using [whatsmeow](https://pkg.go.dev/go.mau.fi/whatsmeow) library. **NO need of WhatsApp Business account, 100% free.**
+
+### Features
+
+* Free whatsapp server that acts like WhatsApp web.
+* Conversations with the users via text or voice messages. For voice, the user sends it, and server returns text answer.
+* It answers in the same language that the user. All languages supported!!.
+
+### Architecture
+
+Refer to [architecture-Whatsapp.png](docs/architecture-Whatsapp.png).
+
+### Implementation
+
+For this channel you will need a phone with WhatsApp installed and with a number. The server will act as a WhatsApp client that will pair with your WhatsApp account. 
+After initialize the server you will see in the logs a QR code. Scan that QR code with the WhatsApp account that you will use.
+If you can't scan the QR code you can also link the WhatsApp account using a pair code. For that you must set the envar `PAIR_PHONE_NUMBER` with your phone number using format show in the `.env.example`. If you don't need the pair code don't set this envar.
+
+Once you pair your WhatsApp account the session will be stored in a sqlite file. This file is created inside the container but mapped through a docker volume, so you can use it when you want to develop locally. If you delete this file you will have to login again using a new QR code.
+
+### STT Tool
+
+When receiving an audio message it uses an STT tool to transcribe. It can be the same already mentioned in the VoIP channel.
+
+### Languages supported
+
+All languages that you want!!!
+
+## Assistants Integration
+
+Currently the channels are integrated with two LLM/NLU assistants.
+
+* [RASA](./assistants/rasa/README.md)
+* [Anthropic](./assistants/anthropic/README.md)
 
 ## Dependencies
 
@@ -16,7 +108,7 @@ Install go dependencies with `go mod tidy`. Run it as well if you add a new pack
 
 ### Environment variables
 
-Check the variables in `env.example` file. Create `.env` file with `cp -a .env.example .env` and modify it with your values. 
+Check the variables in `env.example` file. There you will have a detailed description of each variable to setup the communications channels with the STT tool and assistant of your choice. Create `.env` file with `cp -a .env.example .env` and modify it with your values. 
 Read carefully the file to know which variables are relevant for each component
 
 ## Run
@@ -26,18 +118,18 @@ You can pull the docker image and run it with the environment variables set up.
 ```sh
 docker pull ghcr.io/felipem1210/freetalkbot/freetalkbot:latest
 COM_CHANNEL=audio #or whatsapp
-ocker run -it --rm --env-file ./.env ghcr.io/felipem1210/freetalkbot/freetalkbot:latest freetalkbot init -c $COM_CHANNEL
+docker run -it --rm --env-file ./.env ghcr.io/felipem1210/freetalkbot/freetalkbot:latest freetalkbot init -c $COM_CHANNEL
 ```
 
 ## Development
 
 For local development you can use docker or podman to raise up the components defined in the `docker-compose.yml` file. These components are:
 
 * Asterisk
-* Anthropic
+* Anthropic connector
 * Rasa assistant
 * Rasa Actions server
-* [Whisper ASR](https://ahmetoner.com/whisper-asr-webservice/) (optional)
+* Faster Whisper Server (optional)
 * Audio bot server
 * Whatsapp bot server
 
@@ -49,9 +141,9 @@ Run `make build`. This will build locally all the images needed for components.
 
 After setting up properly the environment variables:
 
-* Without whisper-asr: `make run`
-* With whisper-local using cpu: `make run-local-whisper-cpu`
-* With whisper-local using gpu: `make run-local-whisper-gpu`
+* Without faster-whisper-server: `make run`
+* With faster-whisper-server using cpu: `make run-local-whisper-cpu`
+* With faster-whisper-server using gpu: `make run-local-whisper-gpu`
 
 ### Configure asterisk
 
@@ -65,32 +157,6 @@ Asterisk is raised up in network_mode brige. The asterisk configuration files ar
 * For SIP checkout `pjsip_endpoint.conf` file in `asterisk/container-config` folder.
 * For IAX checkout iax.conf file in `asterisk/local-config` folder.
 
-## Communication Channels
-
-You can communicate with your chatbot assistant via two channels.
-
-### Voice channel
-
-Audiosocket server implementation, receives a request from Asterisk.
-
-### Whatsapp channel
-
-Same variables than audio bot are needed, just change the make command `make run-local-whatsapp`
-
-After initialize you will see in the logs a QR code. Scan that QR code with the whatsapp account that you will use.
-If you can't scan the QR code you can also link the whatsapp account using a pair code. For that you must set the envar `PAIR_PHONE_NUMBER` with your phone number using format show in the `.env.example`. If you don't need the pair code don't set this envar.
-
-Once you pair your whatsapp account the session will be stored in a sqlite file. This file is created inside the container but mapped through a docker volume, so you can use it when you want to develop locally. If you delete this file you will have to login again using a new QR code.
-
-The channel is prepared to receive text or voice messages.
-
-## Assistants
-
-Currently the channels are integrated with two LLM/NLU assistants.
-
-* [RASA](./rasa/README.md)
-* [Anthropic](./anthropic/README.md)
-
 # Gratitude and Thanks
 
 The following projects inspired to the construction of this one:

diff --git a/docs/architecture-Voicebot.drawio.png b/docs/architecture-Voicebot.drawio.png
diff --git a/docs/architecture-Voicebot.png b/docs/architecture-Voicebot.png
diff --git a/docs/architecture-Whatsapp.png b/docs/architecture-Whatsapp.png