In this lab, you will learn how to search for content related to a prompt in the Vector Database (Elasticsearch) and how to construct an answer by passing this context to the LLM qwen2.5
via Ollama.
- ai-rag-framework exposes a REST API for chatting, retrieves context information from the Vector Database (Elasticsearch), and uses Ollama to access the
qwen-2.5
LLM, which generates answers to user prompts.
- Ollama and Elasticsearch settings can be configured in the application.yml file of the ai-rag-framework application.
It is required to complete Lab 1: Ingestion Pipeline before proceeding with this lab to ensure that the Vector Database is already populated.
-
Verify ollama is running or start the program
ollama -v
-
Start AI RAG framework by running the following command:
cd ai-rag-framework/deployment docker compose up --build --force-recreate
-
Use httpie
curl
or equivalent for making API callsObtaining an answer for a prompt can be achieved by invoking the AI RAG framework
http://localhost:9999/chat
endpoint:
curl --request POST \
--url http://localhost:9999/chat \
--header 'Content-Type: text/plain; charset=utf-8' \
--data 'Explain how Caesar'\''s cipher works'
The results include the answer, the documents used for context, and the LLM utilized:
{
"answer": "Caesar's cipher is a substitution cipher that encrypts text by shifting each letter by a fixed number of positions in the alphabet.",
"documentMetadata": [
{
"fileName": "cryptography-2.pdf",
"documentId": "2debeaaf-9424-455a-abea-af9424b55a89",
"source": "cryptography-2.pdf",
"folderId": "5730a944-248d-43cf-b0a9-44248d23cfec",
"distance": 0.7058792
},
{
"fileName": "cryptography-0.pdf",
"documentId": "536fe0b0-cb3f-43f4-afe0-b0cb3f43f42c",
"source": "cryptography-0.pdf",
"folderId": "5730a944-248d-43cf-b0a9-44248d23cfec",
"distance": 0.6827631
},
{
"fileName": "cryptography-3.pdf",
"documentId": "b4470b47-f550-41ef-870b-47f55081ef46",
"source": "cryptography-3.pdf",
"folderId": "5730a944-248d-43cf-b0a9-44248d23cfec",
"distance": 0.6497326
}
]
}
Every answer includes the content of the document, including structured fields:
answer
: The answer to the prompt.documentMetadata
: List of documents and metadata used to provide context to the LLMfileName
: Name of the original file (file.pdf
) in Alfresco Repository.documentId
: A unique identifier for this document in Alfresco Repository.source
: Name of the original file (file.pdf
) in Alfresco Repository.folderId
: ID of the sync folder in the Alfresco Repository where the document resides.distance
: Relevancy of the document measured in distance to the original prompt.
Spring AI for chatting
The service ai-rag-framework is providing the chatting service by using following pieces of code:
Configuration for Vector Database (elasticsearch), ollama, embedding model and LLM for chatting is defined in application.yml
elasticsearch:
uris: http://localhost:9200
ai:
ollama:
base-url: http://localhost:11434
init:
pull-model-strategy: when_missing
chat:
options:
model: qwen2.5
temperature: 0.0
embedding:
options:
model: nomic-embed-text
vectorstore:
elasticsearch:
initialize-schema: true
index-name: alfresco-ai-document-index
dimensions: 768
The prompt is processed using the ChatClient class to process the query. The QuestionAnswerAdvisor class provides the additional context searching the DEFAULT_TOP_K
most relevant results from the vectorStore
using the nomic-embed-text
embedding. These context is used together with the qwen2.5
LLM to build the chat response.
// Configuring advisors to enhance the response quality
ChatResponse response = chatClient.prompt()
.user(query)
.advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults().withTopK(DEFAULT_TOP_K)))
.call()
.chatResponse();