Skip to content

This repository implements a vector search solution based on image and text embeddings. Users can search for similar products using an image or a textual description.

License

Notifications You must be signed in to change notification settings

szeyu/Vector-Search-From-Scratch

Repository files navigation

Vector Search from Scratch

This repository implements a vector search solution based on image and text embeddings. Users can search for similar products using an image or a textual description.

Project Structure

.
├── assets/                 # Folder containing sample product images for demonstration
├── .gitignore              # Git ignore file
├── LICENSE                 # License information
├── LEARN.md                # Learning documentation
├── README.md               # Project documentation
├── add_product.py          # Streamlit app for adding new products to the database
├── dbms_product.py         # Streamlit app for Database management system (DBMS) logic for product handling
├── embedding.py            # Functions to generate embeddings for images and text
├── product_recognision.py  # Functions for searching and recognizing products
├── products.db             # SQLite database for storing product information and embeddings
├── requirements.txt        # Python dependencies required for this project
└── vector_search.py        # Functions for vector-based search and similarity calculation

Features

  • Add Products: Add new products to the database, including name, description, price, and an image.
  • Search by Image: Upload an image to find visually similar products using CLIP-based image embeddings.
  • Search by Text: Enter text to search for semantically similar products using MiniLM-based text embeddings.
  • Delete Products: Remove products from the database directly via the UI.

Getting Started

Prerequisites

  • Python 3.8 or higher
  • Pip package manager

Installation

  1. Clone the repository:
git clone https://github.com/your-username/vector-search.git
cd vector-search
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the add product streamlit interface:
streamlit run add_product.py
  1. Start the product search streamlit interface:
streamlit run dbms_product.py

How It Works

  • Embeddings:

    Images are converted to embeddings using the img2vec function based on CLIP. Text descriptions are converted to embeddings using the text2vec function based on MiniLM.

  • Database:

    Product data (including embeddings) is stored in an SQLite database (products.db).

  • Vector Search:

    The vector_search.py script uses cosine similarity to match query vectors against stored embeddings.

Example Files

The assets/ directory includes sample product images you can use to test the app.

Learn

You can learn this project from LEARN.md.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

This repository implements a vector search solution based on image and text embeddings. Users can search for similar products using an image or a textual description.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages