Skip to content

Never forget the resource that helps to close that sales call! Power a real-time speech-to-text agent with retrieval augmented generation based on webscraped customer use-cases.

Notifications You must be signed in to change notification settings

TatjanaChernenko/dolla_llama

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dolla Llama: Real-Time Co-Pilot for Closing the Deal

Dolla Llama

Implements speech-to-text (STT) and retrieval-augmented generation (RAG) to assist live sales calls.

🌟 Features:

  • STT with Whisper.cpp and llama.cpp for your LLM
  • Custom embeddings for your text corpus using SentenceTransformers
  • Indexing documents + embeddings with ElasticSearch

Table of Contents

  1. Getting Started
  2. Creating Custom Embeddings
  3. Indexing with ElasticSearch
  4. Interface with Gradio
  5. Next Steps

Getting Started

This demo assumes you have:

Setup

Make sure to convert your Llama model to gguf format with llama.cpp for serving using their instructions. Then save the model in a local directory named models/

Launch with:

docker-compose up

And navigate to http://localhost:8090

Creating Custom Embeddings

By fine-tuning with SentenceTransformers, we can generate text embeddings locally for matching with documents in our Elasticsearch index.

The scraper/main.py script scrapes a list of sites to index. You can update the links in scraper/config.json

Indexing with ElasticSearch

Using Elasticsearch, we can index and tag documents for filtering and customization of the relevance scoring.

The scraper/main.py script also handles this after scraping.

Interface with Gradio

With Gradio, you press a button to begin and read suggestions in the chatbox.

The app/app.py contains the logic to run whisper for speech-to-text, run queries on the elasticsearch index, and launch the front-end.

Next Steps

  • Fine-tune an LLM for your usecase
  • Add additional indices for query/retrieval
  • Try a container orchestrator like k8s for robust distributed deployments

About

Never forget the resource that helps to close that sales call! Power a real-time speech-to-text agent with retrieval augmented generation based on webscraped customer use-cases.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.0%
  • Dockerfile 16.7%
  • Shell 1.3%