Skip to content

End-to-end deployment of a scalable RAG chatbot utilizing LangChain for retrieval-based QnA. The project leverages robust CI/CD practices integrating MLFlow with emphasizes on cost analysis.

License

Notifications You must be signed in to change notification settings

Ayush-Patel-10/RAG-using-Azure-Databricks-CI-CD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 

Repository files navigation

RAG-using-Azure-Databricks-CI-CD

Introduction

The project leverages the Retrieval-Augmented Generation (RAG) framework which we are incorporating within our chatbot on Azure Databricks. This approach ensures our chatbot delivers responses that are relevant and contextually precise, while also enabling continuous integration and deployment for streamlined development and updates. This model, integrated within a serverless architecture and supported by Delta Tables for secure data storage and enhances the chatbot's efficiency and scalability while ensuring stringent data security and compliance. Employing MLFlow for lifecycle management further ensures that each model iteration is meticulously tracked and documented, we have leveraged MLFlow's LLM-as-a-judge for evaluating our RAG chatbot.

Project Architecture

llmops_1

Project Overview

This repository houses the RAG-using-Azure-Databricks-CI-CD project, which demonstrates a comprehensive MLOps pipeline encompassing development, production, and monitoring within an Azure Databricks environment.

Getting Started - Setup Guide

To begin working with the RAG-using-Azure-Databricks-CI-CD project, please follow the initial setup instructions detailed in the guide below:

This guide covers creating an Azure account, setting up resource groups, storage accounts, and Databricks workspaces, as well as configuring GitHub secrets and local development tools like the Databricks CLI.

After completing the initial setup, you can proceed to the detailed aspects of the project using the Table of Contents.

Table of Contents

Databricks Folder Structure

The project’s folder structure in Databricks is designed to separate files and artifacts across the test, staging, and prod environments, facilitating organized development and deployment.

Understand our Databricks folder structure

Databricks Workflow

We maintain a detailed workflow for model training, evaluation, and deployment within Databricks, ensuring systematic testing and deployment of our models.

Explore the Databricks workflow

Terraform

Terraform is used for infrastructure provisioning and state management within our Databricks environment.

Review our Terraform practices

CI/CD Workflow

Our project utilizes a CI/CD pipeline that orchestrates the workflow from development to staging and production.

Read more about the CI/CD workflow

Model Version Rollback

Our process for rolling back to previous model versions in production is documented to ensure reliability and ease of transitions.

Learn about model version rollback

MLFlow

MLFlow is integral to our pipeline, providing tools for model versioning, management, and serving in both test and production environments.

Read about our MLFlow setup

Cost Analysis

We conduct a thorough cost analysis to optimize resource allocation and manage expenses effectively.

Delve into our cost analysis approach

About

End-to-end deployment of a scalable RAG chatbot utilizing LangChain for retrieval-based QnA. The project leverages robust CI/CD practices integrating MLFlow with emphasizes on cost analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published