Forked Original code from https://github.com/ganesh3/rag-youtube-assistant
In the era of abundant video content on YouTube, users often struggle to efficiently extract specific information or insights from lengthy videos without watching them in their entirety. This challenge is particularly acute when dealing with educational content, tutorials, or informative videos where key points may be scattered throughout the video's duration.
The YouTube Assistant project addresses this problem by providing a Retrieval-Augmented Generation (RAG) application that allows users to interact with and query video transcripts directly. This solution enables users to quickly access relevant information from YouTube videos without the need to watch them completely, saving time and improving the efficiency of information retrieval from video content.
The YouTube Assistant utilizes data pulled in real-time using the YouTube Data API v3. This data is then processed and stored in two databases:
- SQLite database: For structured data storage
- Elasticsearch vector database: For efficient similarity searches on embedded text
The main columns in our data structure are:
{
"content": {"type": "text"},
"video_id": {"type": "keyword"},
"segment_id": {"type": "keyword"},
"start_time": {"type": "float"},
"duration": {"type": "float"},
"title": {"type": "text"},
"author": {"type": "keyword"},
"upload_date": {"type": "date"},
"view_count": {"type": "integer"},
"like_count": {"type": "integer"},
"comment_count": {"type": "integer"},
"video_duration": {"type": "text"}
}
This schema allows for comprehensive storage of video metadata alongside the transcript content, enabling rich querying and analysis capabilities.
Step 1: Model Download and Testing
Download the Phi-3-mini-128k-instruct PT model from Hugging Face and test its performance.
git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
python test_pt_model.py
Step 2: Export to ONNX
Export the PyTorch model to ONNX format and validate the conversion.
python export_to_onnx.py
Step 3: Convert ONNX to OpenVINO or PyTorch to OpenVINO
Convert the ONNX model to OpenVINO format and test it to ensure smooth operation on CPU with the same accuracy and increased inference speed.
bash Copy code for PyTorch to OpenVINO
optimum-cli export openvino --model "./Phi-3-mini-128k-instruct"
--task text-generation-with-past
--weight-format int4
--group-size 128
--ratio 0.6
--sym
--trust-remote-code /Phi-3-mini-128k-instruct-int4-ov
bash Copy code for ONNX to OpenVINO
optimum-cli export openvino --model "Phi-3-mini-128k-instruct_onnx"
--task text-generation-with-past
--weight-format int4
--group-size 128
--ratio 0.6
--sym
--trust-remote-code /Phi-3-mini-128k-instruct-int4-ov
The YouTube Assistant offers the following key features:
-
Real-time Data Extraction: Utilizes the YouTube Data API v3 to fetch video data and transcripts on-demand.
-
Efficient Data Storage: Stores structured data in SQLite and uses Elasticsearch for vector embeddings, allowing for fast retrieval and similarity searches.
-
Interactive Querying: Provides a chat interface where users can ask questions about the video transcripts that have been downloaded or extracted in real-time.
-
Contextual Understanding: Leverages RAG technology to understand the context of user queries and provide relevant information from the video transcripts.
-
Metadata Analysis: Allows users to query not just the content of the videos but also metadata such as view counts, likes, and upload dates.
-
Time-stamped Responses: Can provide information about specific segments of videos, including start times and durations.
By combining these features, the YouTube Assistant empowers users to efficiently extract insights and information from YouTube videos without the need to watch them in full, significantly enhancing the way people interact with and learn from video content.
The YouTube Assistant project is organized as follows:
youtube-rag-app/
├── app/
│ ├── home.py
│ ├── pages/
│ ├────── chat_interface.py
│ ├────── data_ingestion.py
│ ├────── evauation.py
│ ├────── ground_truth.py
│ ├── transcript_extractor.py
│ ├── data_processor.py
│ ├── elasticsearch_handler.py
│ ├── database.py
│ ├── rag.py
│ ├── query_rewriter.py
│ └── evaluation.py
│ └── utils.py
├── data/
│ └── sqlite.db
├── config/
│ └── config.yaml
├── Phi-3-mini-128k-instruct
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── test_pt_model.py
├── export_to_onnx.py
├── test_onnx_model.py
├── test_ov_model.py
├── .env
app/
: Contains the main application codemain.py
: Entry point of the applicationui.py
: Handles the user interfacetranscript_extractor.py
: Manages YouTube transcript extractiondata_processor.py
: Processes and prepares data for storage and analysiselasticsearch_handler.py
: Manages interactions with Elasticsearchdatabase.py
: Handles SQLite database operationsrag.py
: Implements the Retrieval-Augmented Generation logicquery_rewriter.py
: Refines and optimizes user queriesevaluation.py
: Contains evaluation metrics and functions
data/
: Stores the SQLite databaseconfig/
: Contains configuration filesrequirements.txt
: Lists all Python dependenciesDockerfile
: Defines the Docker image for the applicationdocker-compose.yml
: Orchestrates the application and its servicestest_pt_model.py
: Inferencing original pytorch modelexport_to_onnx.py
: Export model PyTorch to ONNX formattest_onnx_model.py
: Inferencing ONNX modeltest_ov_model.py
: Inferencing OpenVINO model.env
: YOITUBE_API_KEY credentials
-
Create a Google Cloud Project Go to the Google Cloud Console. Click on Select a project or Create Project. Name your project and click Create.
-
Enable the YouTube Data API In the Cloud Console, navigate to APIs & Services > Library. Search for "YouTube Data API v3" and select it. Click Enable.
-
Create Credentials Go to APIs & Services > Credentials. Click on + CREATE CREDENTIALS and select API key. Your API key will be generated. Make sure to copy it, as you’ll need it for your API requests.
conda create -n youtube-rag python=3.11
conda activate youtube-rag
git clone git@github.com:payal211/rag-youtube-assistant-OpenVINO-Optimization.git
cd rag-youtube-assistant-OpenVINO-Optimization
you need to create your .env file for setting up only the YOUTUBE_API_KEY. Please refer .env_template
# install dependencies
pip install -r requirements.txt
# Clone the original model
git clone https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
docker-compose build app
docker-compose up -d
You need to have Docker Desktop installed on your laptop/workstation along with WSL2 on windows machine.
Open the browser and paste localhost:8501
GPL v3
I use Streamlit to ingest the youtube transcripts, query the transcripts uing LLM & RAG, generate ground truth and evaluate the ground truth.
I am ingesting Youtube transcripts using Youtube Data API v3 and Youtube Transcript package and the code is in transcript_extractor.py and it is run on the Streamlit app using main.py.
"hit_rate":1, "mrr":1
I used the LLM as a Judge metric to evaluate the quality of our RAG Flow on my local machine with CPU and hence the total records evaluated are pretty low (12).
- RELEVANT - 12 (100%)
- PARTLY_RELEVANT - 0 (0%)
- NON RELEVANT - 0 (0%)
I used Grafana to monitor the metrics, user feedback, evaluation results, and search performance.
Please refer screenshots.md