Skip to content

abuelnasr0/Semantic-Search-project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Search Engine to search in videos based on (SBERT)

Main Character

Goal

is to simplify the process of searching through videos by utilizing semantic search technology, specifically the SBERT model.
We want to make it easier for users to find the specific information they need within a video, without having to watch the entire video or listen to the whole podcast.
By using semantic search technology, our project aims to improve the efficiency of searching through videos, particularly for educational purposes.
This will save users time and make it easier for them to find the information they need, ultimately improving their overall experience.
We believe that our project will be particularly beneficial for students and educators who need to find specific information within educational videos. By simplifying the search process, we aim to improve the accessibility and quality of education for students and educators alike.

Overall, the aim of our project is to make searching for relevant video content more efficient and effective, ultimately improving the overall learning experience for users.

Features

  • Powerful search functionality:
    Our project provides a powerful search functionality that allows users to find and access relevant information within videos.
    By utilizing a variety of techniques and methods, we have been able to significantly improve the accuracy and relevance of search results.
  • Flexible search options:
    Our search functionality offers users a range of flexible search options, including keyword-based search, phrase-based search, and advanced search filters.
    This makes it easier for users to find the information they need within a video.
  • Transcript visualization:
    Our project includes a user-friendly transcript visualization feature that allows users to quickly navigate through the video transcript and find the information they need.
    The transcript visualization feature includes timestamps, which helps users quickly identify the relevant sections of the transcript.
  • Video previews:
    Our project allows users to preview video content before watching it.
    This is particularly useful for users who are looking for specific information within a video and want to quickly identify whether the video is relevant to their needs.
  • Responsive design:
    Our project is designed to be responsive and accessible across a range of devices, including desktops, laptops, tablets, and smartphones.
    This makes it easy for users to access the information they need, wherever they are.
  • Secure user authentication:
    Our project includes a secure user authentication system that ensures user data is protected and secure.
    This is particularly important for users who are sharing sensitive or confidential information within the video platform.

Results

Datasets

We created 2 datasets for Information retrieval systems based on the domain of Artificial Intelegence. The datasets are derived from the ArXiv dataset which contains research papers in many fields, but we filtered it to the domain of computer science (Artificial intelligence).

The datasets are uploaded to Kaggle and Hugging Face:

Models

We used the datasets to train our model on the domain of AI, to create 2 models on the 2 datasets. The model trained on title based dataset has the best results on the datasets compared to the base model and the other model.

The model is uploaded on Hugging Face. Model

Training Results

Model Name v2-Titles-wiht_150000_samples Base v2-Query_150000_105000_105000 Base
Corpus Size 150k 150k 150k 150k
Queries Size 22.5k 22.5k 22.5k 22.5k
Acc@5 83.45% 75.05% 63.49% 60.09%
Acc@10 87.78% 79.98% 71.58% 67.41%
Acc@1oo 96.06% 91.04% 90.79% 85.61%
Precision@1 65.53% 58.27% 41.06% 39.79%
Precision@3 26.43% 23.67% 18.94% 18.09%
Precision@5 16.69% 15.01% 12.70% 12.02%
Precission@10 8.78% 8.00% 7.16% 6.74%
Recall@3 79.30% 71.01% 56.82% 54.26%
Recall@5 83.45% 75.05% 63.49% 60.09%
Recall@10 87.78% 79.98% 71.58% 67.41%
Recall@100 96.06% 91.04% 90.79% 85.61%
MRR@10 0.7327 0.6557 0.5062 0.4854
MRR@100 0.7364 0.6604 0.5143 0.4929
NDCG@10 0.768 0.6905 0.5564 0.5307
NDCG@100 0.7858 0.7139 0.597 0.5688

Thanks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 43.9%
  • JavaScript 31.4%
  • Python 21.2%
  • CSS 3.3%
  • Mako 0.2%