Feature Request: Support for Negative Embeddings in Similarity Searches #19239

White-RaBot · 2024-03-18T18:02:31Z

White-RaBot
Mar 18, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

I propose adding support for negative embeddings in similarity searches within LangChain's search functionalities. This feature would allow users to specify not only what they are searching for (via positive embeddings) but also what they want to exclude from the search results (via negative embeddings). This could significantly improve the precision of search results in applications requiring nuanced context understanding, such as content recommendation systems or semantic search engines.

Motivation

Current similarity search capabilities in LangChain are powerful for finding close matches based on vector embeddings. However, they lack the direct ability to exclude certain concepts or themes, which could be equally important for refining search results. For example, when searching for content related to "Animals," a user might want to exclude "Cats" from the results. Incorporating negative embeddings would allow for this level of search refinement.

Proposal (If applicable)

Something like the following:

def __query_collection(
    self,
    embedding: List[float],
    k: int = 4,
    filter: Optional[Dict[str, str]] = None,
    negative_embeddings: Optional[List[List[float]]] = None
) -> List[Any]:
    """Query the collection, allowing for negative embeddings to affect the query."""
    with Session(self._bind) as session:
        collection = self.get_collection(session)
        if not collection:
            raise ValueError("Collection not found")

        filter_by = self.EmbeddingStore.collection_id == collection.uuid

        if filter is not None:
            filter_clauses = []

            for key, value in filter.items():
                if isinstance(value, dict):
                    filter_by_metadata = self._create_filter_clause(key, value)

                    if filter_by_metadata is not None:
                        filter_clauses.append(filter_by_metadata)
                else:
                    filter_by_metadata = self.EmbeddingStore.cmetadata[
                        key
                    ].astext == str(value)
                    filter_clauses.append(filter_by_metadata)

            filter_by = sqlalchemy.and_(filter_by, *filter_clauses)

        # The original distance calculation for the positive query
        distance = self.distance_strategy(embedding)

        if negative_embeddings:
            # If negative embeddings are provided, adjust the distance calculation
            # This example assumes 'distance_strategy' can handle negative penalties, 
            # which might involve custom logic to integrate the effect of negative embeddings
            for neg_embedding in negative_embeddings:
                # Adjust 'distance' by the negative effect. This is pseudocode and
                # will need to be replaced with your actual logic to compute the negative impact.
                distance -= self.negative_distance_strategy(neg_embedding)

        results: List[Any] = (
            session.query(
                self.EmbeddingStore,
                distance.label("distance"),
            )
            .filter(filter_by)
            .order_by(sqlalchemy.asc("distance"))
            .join(
                self.CollectionStore,
                self.EmbeddingStore.collection_id == self.CollectionStore.uuid,
            )
            .limit(k)
            .all()
        )
    return results

hinthornw · 2024-03-18T20:26:04Z

hinthornw
Mar 18, 2024
Maintainer

Would this be more appropriate to apply via reranking? How common is it for vectorstores to support this type of functionality

1 reply

White-RaBot Mar 20, 2024
Author

I've submitted a PR with a working version of this: #19310

White-RaBot · 2024-03-18T22:32:23Z

White-RaBot
Mar 18, 2024
Author

If the application demands high precision from the initial set of results and aims to minimize irrelevant or undesired data from the start, directly incorporating negative embeddings could offer significant benefits.

The support for direct manipulation of search results using negative embeddings within vector stores is not universally standard. Traditional vector search technologies focus on retrieving the closest matches based on positive similarity or relevance scores. However, as the field of search and retrieval evolves, there's an increasing interest in more nuanced search functionalities, including the ability to proactively exclude certain dimensions or characteristics represented by negative embeddings.

Reranking is a reactive process that adjusts the relevance of search results after they have been retrieved based on positive similarity scores. It can incorporate negative signals by downgrading or filtering out undesired results during this post-processing phase. Reranking offers flexibility and can be less computationally intensive, making it a suitable option for applications where adjusting the order of an already retrieved set of results is sufficient.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support for Negative Embeddings in Similarity Searches #19239

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Feature Request: Support for Negative Embeddings in Similarity Searches #19239

White-RaBot Mar 18, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 2 comments · 1 reply

hinthornw Mar 18, 2024 Maintainer

White-RaBot Mar 20, 2024 Author

White-RaBot Mar 18, 2024 Author

White-RaBot
Mar 18, 2024

Replies: 2 comments 1 reply

hinthornw
Mar 18, 2024
Maintainer

White-RaBot Mar 20, 2024
Author

White-RaBot
Mar 18, 2024
Author