vector-search/foundations/atlas-vector-search at master · esteininger/vector-search

History

Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
assets		assets
Atlas_Vector_Search_Demonstration.ipynb		Atlas_Vector_Search_Demonstration.ipynb
Readme.md		Readme.md
diagram.png		diagram.png
index.png		index.png
main.py		main.py

Readme.md

Atlas Vector Search

Using MongoDB Atlas' Vector Search engine, we store dense vectors and calculate similarities all within the data storage layer.

See the short video below:

Jupyter Notebook Demonstration

1. Install Dependencies

pip install sentence_transformers, pymongo

2. Import Libraries

We'll be using a popular pre-trained sentence transformer model. You can alternatively train your own or re-train an existing one.

from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

import pymongo
connection = pymongo.MongoClient(mongo_uri)
vector_collection = connection['eap']['vector']

3. Prepare Corpus

Convert each object's name into its' corresponding vector embedding, then store it in the vector database.

products = [
    {"name": "Mozzarella"},
    {"name": "Parmesan"},
    {"name": "Cheddar"},
    {"name": "Brie"},
    {"name": "Swiss"},
    {"name": "Gruyere"},
    {"name": "Feta"},
    {"name": "Gouda"},
    {"name": "Provolone"},
    {"name": "Monterey Jack"}
]

# create a new embedding field for each product object
for product in products:
  # convert to embedding, then to array
    embeddings = model.encode(product['name']).tolist()
    product['embedding'] = embeddings
    vector_collection.insert(product)

4. Create Vector Index

We use the default HNSW KNN index structure when we create our field mapping definition:

{
  "mappings": {
    "fields": {
      "embedding": [
        {
          "dimensions": 384,
          "similarity": "euclidean",
          "type": "knnVector"
        }
      ]
    }
  }
}

5. Calculate Similarity

The heart of vector search is in the similarity calculation. Here we use cosine similarity but you can experiment with others.

query = "cheese"
vector_query = model.encode(query).tolist()

pipeline = [
    {
        "$search": {
            "knnBeta": {
                "vector": vector_query,
                "path": "embedding",
                "k": 10
            }
        }
    },
    {
        "$project": {
            "embedding": 0,
            "_id": 0,
            'score': {
                '$meta': 'searchScore'
            }
        }
    }
]

Retrieval Augmentation

Retrieval augmentation is a technique for improving QA bots by augmenting the prompt with relevant documents retrieved from a knowledge base. This helps the bots to access information outside of their training set and improve interpretability. However, it can be challenging to retrieve relevant documents and for the bots to understand them.

Rough Steps

Accept corpus
Split into chunks
Embed the chunks
Ask a question (Q)
Return top K chunks
Run chunks through GPT
Return results in a structured form

Advanced Workflow

Embedding
ANN Search
Filter
Re-Ranking

Model versioning, hosting, and scaling
Implement any number of the vector search use cases

Links

Lucene HNSW Data Type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atlas-vector-search

atlas-vector-search

Readme.md

Atlas Vector Search

1. Install Dependencies

2. Import Libraries

3. Prepare Corpus

4. Create Vector Index

5. Calculate Similarity

Retrieval Augmentation

Rough Steps

Advanced Workflow

Links

Files

atlas-vector-search

Directory actions

More options

Directory actions

More options

Latest commit

History

atlas-vector-search

Folders and files

parent directory

Readme.md

Atlas Vector Search

1. Install Dependencies

2. Import Libraries

3. Prepare Corpus

4. Create Vector Index

5. Calculate Similarity

Retrieval Augmentation

Rough Steps

Advanced Workflow

Links