Implements speech-to-text (STT) and retrieval-augmented generation (RAG) to assist live sales calls.
- STT with Whisper.cpp and llama.cpp for your LLM
- Custom embeddings for your text corpus using SentenceTransformers
- Indexing documents + embeddings with ElasticSearch
- Getting Started
- Creating Custom Embeddings
- Indexing with ElasticSearch
- Interface with Gradio
- Next Steps
This demo assumes you have:
- docker and docker-compose installed
- Familiarity with RAG and its applications
Make sure to convert your Llama model to gguf
format with llama.cpp for serving using their instructions.
Then save the model in a local directory named models/
Launch with:
docker-compose up
And navigate to http://localhost:8090
By fine-tuning with SentenceTransformers, we can generate text embeddings locally for matching with documents in our Elasticsearch index.
The scraper/main.py script scrapes a list of sites to index. You can update the links in scraper/config.json
Using Elasticsearch, we can index and tag documents for filtering and customization of the relevance scoring.
The scraper/main.py script also handles this after scraping.
With Gradio, you press a button to begin and read suggestions in the chatbox.
The app/app.py contains the logic to run whisper for speech-to-text, run queries on the elasticsearch index, and launch the front-end.
- Fine-tune an LLM for your usecase
- Add additional indices for query/retrieval
- Try a container orchestrator like k8s for robust distributed deployments