Llama 3.1, vikhr nemo 12b, Command-R, Ollama 0.4.6, Chroma 0.5.4, Tavily AI
- Real-time recording: Operator-patient dialogues are recorded on the fly.
- Audio-to-text conversion: Speech is converted to text using VOSK.
- Text analysis: The input text is analyzed with an LLM (Large Language Model).
- Keyword extraction: Key terms are identified for information retrieval.
- Database query: The vector database (ChromaDB) is queried for relevant information.
- Text filtering: Retrieved text is filtered using the
cointegrated/rubert-tiny2
model.
- Text generation: Responses are created using an LLM.
- Hallucination check:
- Ensures the response aligns with the database query results.
- Verifies correspondence with the extracted keywords.
- Operator assistance: The generated text is presented to the operator in the chatbot interface.
- Manages the vector database.
- Performs text embedding.
- Handles speech recognition.
- Controls agent logic and integrates with clinical systems.
- Processes requests using the LLM.
- Generates textual responses.
- Speech Recognition: VOSK
- Vector Database: ChromaDB
- Text Filtering Model:
cointegrated/rubert-tiny2
- Language Models: Large Language Models (LLMs)
The AI agent supports a flexible approach to request processing using complex routing logic and retrieval of relevant information. Its key functionalities include:
- Identifies the data source for request processing:
- Vector Storage: Uses ChromaDB optimized for result diversity (MMR).
- Chat with Memory: Supports conversational mode with prior interaction history.
- Web Search: Acts as a fallback if no relevant data is found locally.
- Session Termination: Provides an option to end the agent's operation.
- Focus and Enhancement of Queries:
- Extracts keywords to optimize the search process.
- Refines queries for precise information retrieval.
- Multi-Step Data Retrieval: Ensures high accuracy by:
- Utilizing various search strategies in the vector database:
- High response diversity (MMR,
lambda_mult=0.25
). - Moderate diversity (
lambda_mult=0.85
). - Mathematical computation of diversity based on the probability of finding relevant content in the collection.
- Specific embedding models for fallback searches.
- High response diversity (MMR,
- Selecting appropriate data collections for specific tasks.
- Utilizing various search strategies in the vector database:
- Filters documents based on similarity to user query keywords (cosine distance evaluation).
- Evaluates document relevance:
- Matches document content to the request using scoring models.
- Automatically switches to alternate sources if no relevant data is found.
- Utilizes LLMs like
Llama 3.1 70b fp16
,rscr/vikhr_nemo_12b
, orCommand-R
for task-specific generation:- Generates text based on:
- Local vector database (RAG).
- Web search results if local data is insufficient.
- Verifies responses to eliminate hallucinations and ensure alignment with user queries.
- Generates text based on:
- Maintains dialogue context for more accurate responses.
- Stores interaction history for seamless user experience.
- Proceeds to the next processing step if relevant data is unavailable, up to session termination.
- Supports multi-step routing through a state graph.
- Employs asynchronous methods for parallel task execution, such as:
- Data retrieval.
- Response generation.
- Document relevance assessment.
- Works with ChromaDB through built-in retrievers:
- Implements search strategies like MMR and cosine similarity for diverse results.
- Leverages fallback collections built using various embedding models (e.g., LaBSE, Distiluse).
- Uses evaluation models to:
- Analyze document relevance.
- Ensure the adequacy of generated responses.
- User interaction through a chat interface.
- Session termination command support.
- Automatic handling of multiple requests within a session.
An asynchronous web server based on Quart will provide the interface for user interaction and agent integration into workflows. The interface supports session management for call center operators and file upload for data placement in vector collections based on user access rights.
- Users can send queries through a web page (CRM integration or standalone Windows app as per client agreement).
- AI agent responses are displayed in real time.
- Dialogue history is preserved for context-aware responses.
- Queries are processed asynchronously for optimal performance.
- Supports uploading TXT, PDF, and web links (additional formats per client agreement).
- Uploaded files are stored and processed for addition to ChromaDB.
- Allows prioritization of specific collections for user requests.
- Route
/
: Displays the chat interface using theindex.html
template.
- Route
/get
:- Accepts text queries via AJAX requests.
- Saves queries and responses in session history.
- Asynchronously calls the
get_agent_response
function for generating responses.
- Route
/upload
:- Accepts files via POST requests.
- Supports
.pdf
and.txt
formats. - Processes files asynchronously and adds their content to the vector database.
- Function
process_file
:- Processes uploaded files.
- Adds content to ChromaDB for future retrieval.
- Utilizes
session
for storing dialogue history. - Ensures context-aware responses in ongoing interactions.
The AI agent uses ChromaDB and embedding models to perform high-precision search, data addition, and document processing. Its retrieval logic is built on an adaptive approach to search and collection management.
- Supported models for text embeddings include:
cointegrated/LaBSE-en-ru
sentence-transformers/distiluse-base-multilingual-cased-v1
ai-forever/sbert_large_nlu_ru
hkunlp/instructor-xl
(instruction-based embedding training)
- Creation, deletion, and listing of collections.
- Environment preparation for collection updates.
- Supported data types:
- PDF: Split into pages and indexed.
- TXT: Split into fragments for optimal indexing.
- URL: Text extracted, processed, and added to the database.
- Multiple search types:
- Simil: Vector similarity search.
- Simil_score: Similarity with scoring.
- Vector: Direct vector-based search.
- MMR: Maximal Marginal Relevance for diverse results.
- Metadata-based filtering.
- Adjustable parameters:
- Number of returned documents (
k
). - Number of fetched documents (
fetch_k
). - Diversity coefficient (
lambda_mult
).
- Number of returned documents (
The module records speech from a call center operator’s headset and processes it through an ASR server (VOSK) via WebSocket. This enables the agent to handle voice requests and convert them into structured text for further processing.
- Provide a voice interface for user interaction.
- Ensure accurate speech-to-text conversion.
- Records audio signals in real time.
- Sends audio blocks to the ASR server via WebSocket.
- Processes recognized text for use in queries.
- Uses
sounddevice
for audio recording. - Asynchronous processing with
websockets
. - Customizable settings via command-line arguments.
For detailed technical documentation and examples, refer to the Documentation.
The code provides a framework for an agent that uses a state graph to handle user queries, perform actions such as document retrieval, answer generation, and web search.
AgentState: Defines the data structure for storing the agent’s current state. This is a TypedDict with fields for messages, the question, generation, web search state, and documents.
Agent: The __init__
constructor initializes the system, tools, and state graph.
-
The
retrieve
method fetches documents from the indexed storage based on the query. -
The
generate
method produces an answer using the retrieved documents. -
The
grade_documents
method assesses the relevance of documents to the given query. -
The
web_search
method performs an Internet search and appends the results to the documents. -
The
route_question
method determines whether the query should be directed to a web search or the vector store. -
The
decide_to_generate
method decides whether to proceed with generating a response or to perform a web search. -
The
grade_generation_v_documents_and_question
method checks whether the generated answer is correct and matches the query.
For any questions or contributions, feel free to open an issue or submit a pull request.