Uploaded and retrived vectors are different #727

paniabhisek · 2024-08-09T11:57:47Z

paniabhisek
Aug 9, 2024

I have uploaded my points of 4 size. Then when I retrieve using scroll method, the values are different.

Code snippet:

from qdrant_client import QdrantClient
from qdrant_client.http.models import PointStruct
from qdrant_client.http import models

client = QdrantClient(url="http://<ip>:6333", prefer_grpc=True)
from qdrant_client.http import models

client.create_collection('test', vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE))
client.upload_points('test',
                     points = [PointStruct(id=19, vector=[0.05, 0.61, 0.76, 0.74])]),
                                     PointStruct(id=28, vector=[0.19, 0.81, 0.75, 0.11])])
scroll = client.scroll('test', with_vectors=True)
scroll[0]

output:

[Record(id=19, payload={}, vector=[0.04082754999399185, 0.4980961084365845, 0.6205787658691406, 0.6042477488517761], shard_key=None, order_value=None),
 Record(id=28, payload={}, vector=[0.16881056129932404, 0.719666063785553, 0.6663574576377869, 0.09773242473602295], shard_key=None, order_value=None)]

What is the reason behind this ?

Answered by hash-f

Aug 25, 2024

This is the expected behavior.

Qdrant normalizes the vectors when they are stored to ensure that the distance calculations (like cosine similarity) are meaningful and consistent. When vectors are normalized, their magnitudes are scaled to 1, which simplifies the comparison process by focusing solely on the direction of the vectors rather than their magnitude.

A quick example of how normalization works

import numpy as np

def normalize_vector(vector):
    norm = np.linalg.norm(vector)
    if norm == 0:
        return vector
    return vector / norm

vector = np.array([0.05, 0.61, 0.76, 0.74])
normalized_vector = normalize_vector(vector)
print(normalized_vector)

# Output
# [0.04082755 0.49…

View full answer

hash-f · 2024-08-25T13:27:32Z

hash-f
Aug 25, 2024

This is the expected behavior.

Qdrant normalizes the vectors when they are stored to ensure that the distance calculations (like cosine similarity) are meaningful and consistent. When vectors are normalized, their magnitudes are scaled to 1, which simplifies the comparison process by focusing solely on the direction of the vectors rather than their magnitude.

A quick example of how normalization works

import numpy as np

def normalize_vector(vector):
    norm = np.linalg.norm(vector)
    if norm == 0:
        return vector
    return vector / norm

vector = np.array([0.05, 0.61, 0.76, 0.74])
normalized_vector = normalize_vector(vector)
print(normalized_vector)

# Output
# [0.04082755 0.49809612 0.62057877 0.60424775]

While nomalization improves the performance it does not have any impact on the accuracy.

from numpy import dot
from numpy.linalg import norm
def cosine_similarity(a, b):
    return dot(a, b)/(norm(a)*norm(b))

vector = [0.05, 0.61, 0.76, 0.74]
normalied_vector = [0.04082754999399185, 0.4980961084365845, 0.6205787658691406, 0.6042477488517761]

print(cosine_similarity(vector, vector))
print(cosine_similarity(vector, normalied_vector))

# Output
# 1.0
# 1.0

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uploaded and retrived vectors are different #727

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Uploaded and retrived vectors are different #727

paniabhisek Aug 9, 2024

Replies: 1 comment

hash-f Aug 25, 2024

paniabhisek
Aug 9, 2024

hash-f
Aug 25, 2024