Distributed Local Retrieval-Augmented Generation (RAG) Agent Using LangGraph.

Adapted for the Russian Language

Llama 3.1, vikhr nemo 12b, Command-R, Ollama 0.4.6, Chroma 0.5.4, Tavily AI

Agent Operation Algorithm

Request Processing Workflow

Speech Recognition

Real-time recording: Operator-patient dialogues are recorded on the fly.
Audio-to-text conversion: Speech is converted to text using VOSK.

Keyword Extraction

Text analysis: The input text is analyzed with an LLM (Large Language Model).
Keyword extraction: Key terms are identified for information retrieval.

Information Retrieval

Database query: The vector database (ChromaDB) is queried for relevant information.
Text filtering: Retrieved text is filtered using the cointegrated/rubert-tiny2 model.

Response Formation

Text generation: Responses are created using an LLM.
Hallucination check:
- Ensures the response aligns with the database query results.
- Verifies correspondence with the extracted keywords.

Prompt Display

Operator assistance: The generated text is presented to the operator in the chatbot interface.

Task Distribution Between Servers

Server 1 (2 x NVIDIA RTX 4090)

Manages the vector database.
Performs text embedding.
Handles speech recognition.
Controls agent logic and integrates with clinical systems.

Server 2 (4 x AMD Radeon RX 7900 XTX)

Processes requests using the LLM.
Generates textual responses.

Tools and Technologies

Speech Recognition: VOSK
Vector Database: ChromaDB
Text Filtering Model: cointegrated/rubert-tiny2
Language Models: Large Language Models (LLMs)

Functional Capabilities of the AI Agent

AI Agent Features

The AI agent supports a flexible approach to request processing using complex routing logic and retrieval of relevant information. Its key functionalities include:

1. Request Recognition and Routing

Identifies the data source for request processing:
- Vector Storage: Uses ChromaDB optimized for result diversity (MMR).
- Chat with Memory: Supports conversational mode with prior interaction history.
- Web Search: Acts as a fallback if no relevant data is found locally.
- Session Termination: Provides an option to end the agent's operation.

2. Request Processing

Focus and Enhancement of Queries:
- Extracts keywords to optimize the search process.
- Refines queries for precise information retrieval.
Multi-Step Data Retrieval: Ensures high accuracy by:
- Utilizing various search strategies in the vector database:
  - High response diversity (MMR, lambda_mult=0.25).
  - Moderate diversity (lambda_mult=0.85).
  - Mathematical computation of diversity based on the probability of finding relevant content in the collection.
  - Specific embedding models for fallback searches.
- Selecting appropriate data collections for specific tasks.

3. Filtering and Relevance Verification

Filters documents based on similarity to user query keywords (cosine distance evaluation).
Evaluates document relevance:
- Matches document content to the request using scoring models.
- Automatically switches to alternate sources if no relevant data is found.

4. Response Generation

Utilizes LLMs like Llama 3.1 70b fp16, rscr/vikhr_nemo_12b, or Command-R for task-specific generation:
- Generates text based on:
  - Local vector database (RAG).
  - Web search results if local data is insufficient.
- Verifies responses to eliminate hallucinations and ensure alignment with user queries.

5. Conversational Memory

Maintains dialogue context for more accurate responses.
Stores interaction history for seamless user experience.

6. Error Handling and Session Logic

Proceeds to the next processing step if relevant data is unavailable, up to session termination.
Supports multi-step routing through a state graph.

7. Asynchronous Processing

Employs asynchronous methods for parallel task execution, such as:
- Data retrieval.
- Response generation.
- Document relevance assessment.

8. Integration with ChromaDB

Works with ChromaDB through built-in retrievers:
- Implements search strategies like MMR and cosine similarity for diverse results.
- Leverages fallback collections built using various embedding models (e.g., LaBSE, Distiluse).

9. Response Quality Verification

Uses evaluation models to:
- Analyze document relevance.
- Ensure the adequacy of generated responses.

10. Interface and Session Management

User interaction through a chat interface.
Session termination command support.
Automatic handling of multiple requests within a session.

User Interface Features

An asynchronous web server based on Quart will provide the interface for user interaction and agent integration into workflows. The interface supports session management for call center operators and file upload for data placement in vector collections based on user access rights.

Key Interface Features

1. Web Chat Interface

Users can send queries through a web page (CRM integration or standalone Windows app as per client agreement).
AI agent responses are displayed in real time.
Dialogue history is preserved for context-aware responses.

2. Asynchronous Query Handling

Queries are processed asynchronously for optimal performance.

3. File Upload and Processing

Supports uploading TXT, PDF, and web links (additional formats per client agreement).
Uploaded files are stored and processed for addition to ChromaDB.

4. Data Collection Management

Allows prioritization of specific collections for user requests.

Detailed Interface Description

1. Main Page

Route /: Displays the chat interface using the index.html template.

2. Text Query Handling

Route /get:
- Accepts text queries via AJAX requests.
- Saves queries and responses in session history.
- Asynchronously calls the get_agent_response function for generating responses.

3. File Upload

Route /upload:
- Accepts files via POST requests.
- Supports .pdf and .txt formats.
- Processes files asynchronously and adds their content to the vector database.

4. Document Processing

Function process_file:
- Processes uploaded files.
- Adds content to ChromaDB for future retrieval.

5. Session Support

Utilizes session for storing dialogue history.
Ensures context-aware responses in ongoing interactions.

Data Retrieval and Vector Database Integration

The AI agent uses ChromaDB and embedding models to perform high-precision search, data addition, and document processing. Its retrieval logic is built on an adaptive approach to search and collection management.

Key Retrieval Features

1. Embedding Models

Supported models for text embeddings include:
- cointegrated/LaBSE-en-ru
- sentence-transformers/distiluse-base-multilingual-cased-v1
- ai-forever/sbert_large_nlu_ru
- hkunlp/instructor-xl (instruction-based embedding training)

2. ChromaDB Collection Management

Creation, deletion, and listing of collections.
Environment preparation for collection updates.

3. Data Upload and Processing

Supported data types:
- PDF: Split into pages and indexed.
- TXT: Split into fragments for optimal indexing.
- URL: Text extracted, processed, and added to the database.

4. Search and Filtering

Multiple search types:
- Simil: Vector similarity search.
- Simil_score: Similarity with scoring.
- Vector: Direct vector-based search.
- MMR: Maximal Marginal Relevance for diverse results.
Metadata-based filtering.

5. Flexible Search Parameter Management

Adjustable parameters:
- Number of returned documents (k).
- Number of fetched documents (fetch_k).
- Diversity coefficient (lambda_mult).

Speech Recognition

The module records speech from a call center operator’s headset and processes it through an ASR server (VOSK) via WebSocket. This enables the agent to handle voice requests and convert them into structured text for further processing.

Key Features

1. Goal

Provide a voice interface for user interaction.
Ensure accurate speech-to-text conversion.

2. Workflow

Records audio signals in real time.
Sends audio blocks to the ASR server via WebSocket.
Processes recognized text for use in queries.

3. Technical Details

Uses sounddevice for audio recording.
Asynchronous processing with websockets.
Customizable settings via command-line arguments.

For detailed technical documentation and examples, refer to the Documentation.

Code Description

The code provides a framework for an agent that uses a state graph to handle user queries, perform actions such as document retrieval, answer generation, and web search.

Core LangGraph Logic

AgentState: Defines the data structure for storing the agent’s current state. This is a TypedDict with fields for messages, the question, generation, web search state, and documents.

Agent: The __init__ constructor initializes the system, tools, and state graph.

The retrieve method fetches documents from the indexed storage based on the query.
The generate method produces an answer using the retrieved documents.
The grade_documents method assesses the relevance of documents to the given query.
The web_search method performs an Internet search and appends the results to the documents.
The route_question method determines whether the query should be directed to a web search or the vector store.
The decide_to_generate method decides whether to proceed with generating a response or to perform a web search.
The grade_generation_v_documents_and_question method checks whether the generated answer is correct and matches the query.

Contact

For any questions or contributions, feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Deprecated		Deprecated
Future		Future
Upload		Upload
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aretrieve3.py		aretrieve3.py
check.py		check.py
config.ini		config.ini
config.py		config.py
conv_async_graph_operator3.py		conv_async_graph_operator3.py
embedding_filtration.py		embedding_filtration.py
formulate.py		formulate.py
generate2.py		generate2.py
json_converter.py		json_converter.py
lemmatize_filtration.py		lemmatize_filtration.py
listen_to_vosk.py		listen_to_vosk.py
quart_app2.py		quart_app2.py
routing2.py		routing2.py
search.py		search.py
visualise_graph.py		visualise_graph.py
vosk_test_microphone.py		vosk_test_microphone.py

License

Maytereya/Distributed-Local-AI-Agent

Folders and files

Latest commit

History

Repository files navigation

Distributed Local Retrieval-Augmented Generation (RAG) Agent Using LangGraph.

Adapted for the Russian Language

Agent Operation Algorithm

Request Processing Workflow

Speech Recognition

Keyword Extraction

Information Retrieval

Response Formation

Prompt Display

Task Distribution Between Servers

Server 1 (2 x NVIDIA RTX 4090)

Server 2 (4 x AMD Radeon RX 7900 XTX)

Tools and Technologies

Functional Capabilities of the AI Agent

AI Agent Features

1. Request Recognition and Routing

2. Request Processing

3. Filtering and Relevance Verification

4. Response Generation

5. Conversational Memory

6. Error Handling and Session Logic

7. Asynchronous Processing

8. Integration with ChromaDB

9. Response Quality Verification

10. Interface and Session Management

User Interface Features

Key Interface Features

1. Web Chat Interface

2. Asynchronous Query Handling

3. File Upload and Processing

4. Data Collection Management

Detailed Interface Description

1. Main Page

2. Text Query Handling

3. File Upload

4. Document Processing

5. Session Support

Data Retrieval and Vector Database Integration

Key Retrieval Features

1. Embedding Models

2. ChromaDB Collection Management

3. Data Upload and Processing

4. Search and Filtering

5. Flexible Search Parameter Management

Speech Recognition

Key Features

1. Goal

2. Workflow

3. Technical Details

Code Description

Core LangGraph Logic

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages