Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Azure AI Search Hybrid Semantic Search is unusable due to hardcoded parameter #17636

Open
edgBR opened this issue Jan 26, 2025 · 5 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@edgBR
Copy link

edgBR commented Jan 26, 2025

Bug Description

Hi all,

I noticed that all my semantic hybrid queries fail when defining an index as retrieves. My index definition is as follows:

credentials = AzureCliCredential()
cognitive_services_specific_ad_token = get_bearer_token_provider(  
    credentials,  
    "https://cognitiveservices.azure.com/.default"  
)  

semantic_search_config = SemanticSearch(configurations=[SemanticConfiguration(
            name="default",
            prioritized_fields=SemanticPrioritizedFields(
                content_fields=[SemanticField(field_name="content")],
            ),
        )]
                                )
# Initialize search clients
index_client = SearchIndexClient(endpoint=SEARCH_SERVICE_ENDPOINT, credential=credentials, semantic_search=semantic_search_config)
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=INDEX_NAME,
    index_management=IndexManagement.VALIDATE_INDEX,
    id_field_key="id",
    chunk_field_key="content",
    embedding_field_key="content_vector",
    embedding_dimensionality=3072,
    metadata_string_field_key="metadata",
    doc_id_field_key="title",
    language_analyzer="en.lucene",
    vector_algorithm_type="hnsw",
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    [],
    storage_context=storage_context,
)

When calling:

from tqdm import tqdm
import json
from openinference.instrumentation import using_metadata
from phoenix.trace import using_project

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from llama_index.core import get_response_synthesizer
import pprint

# define response synthesizer
response_synthesizer = get_response_synthesizer()

semantic_hybrid_retriever = index.as_retriever(
    vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID, similarity_top_k=5
)
semantic_hybrid_query_engine = RetrieverQueryEngine(
    retriever=semantic_hybrid_retriever, response_synthesizer=response_synthesizer
)

# Load all evaluation questions from queries.jsonl
eval_questions = []
with open("qa.jsonl", "r") as file:
    for line in file:
        # Parse each line as JSON and extract the query
        json_line = json.loads(line.strip())
        eval_questions.append(json_line)

# List of query engines and their respective project names
query_engines = [
    #(keyword_query_engine, "Keyword"),
    #(hybrid_query_engine, "Hybrid"),
    (semantic_hybrid_query_engine, "Semantic_Hybrid"),
]

# Loop through each question and query it against each engine
for query_data in tqdm(eval_questions):
    query = query_data["question"]
    query_classification = query_data.get("query_classification", "undefined")  # Default to 'undefined' if not present

    for engine, project_name in query_engines:
        try:
            metadata = query_classification
            with using_project(project_name), using_metadata(metadata):
                # Assuming the query method expects a string query and returns results
                engine.query(query)
        except Exception as e:
            print(f"Error querying {project_name} for query '{query}': {e}")

I get errors for all the queries.

Looking to the internal code of llamaindex I have not been able to see a way to pass the semantic config parameters via kwargs:

Image

Version

0.12.9

Steps to Reproduce

Follow this notebooks:

https://github.com/farzad528/azure-ai-search-python-playground/blob/main/azure-ai-search-rag-eval-arize-ai.ipynb

And adapt it to your Azure Endpoints. Besides your semantic config needs to be called something different from: "mySemanticConfig"

In my case is default:

Image

Relevant Logs/Tracbacks

@edgBR edgBR added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 26, 2025
Copy link

dosubot bot commented Jan 26, 2025

Hey @edgBR! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together!

To resolve the issue with semantic hybrid queries failing due to a hardcoded parameter in the LlamaIndex code, you need to ensure that the semantic configuration name is correctly set to "default" in your code. The error occurs because the semantic configuration name might be set to "mySemanticConfig" in the LlamaIndex code, which doesn't match your setup.

Here's how you can update the code to use the "default" semantic configuration:

  1. Locate the _create_query_result and _acreate_query_result methods in the AzureQueryResultSearchSemanticHybrid class within the LlamaIndex code.

  2. Update the semantic_configuration_name parameter to "default" in both methods:

def _create_query_result(
    self, search_query: str, vectors: Optional[List[Any]]
) -> VectorStoreQueryResult:
    results = self._search_client.search(
        search_text=search_query,
        vector_queries=vectors,
        top=self._query.similarity_top_k,
        select=self._select_fields,
        filter=self._odata_filter,
        query_type="semantic",
        semantic_configuration_name="default",  # Change to "default"
    )
    # ... rest of the code ...

async def _acreate_query_result(
    self, search_query: str, vectors: Optional[List[Any]]
) -> VectorStoreQueryResult:
    results = await self._async_search_client.search(
        search_text=search_query,
        vector_queries=vectors,
        top=self._query.similarity_top_k,
        select=self._select_fields,
        filter=self._odata_filter,
        query_type="semantic",
        semantic_configuration_name="default",  # Change to "default"
    )
    # ... rest of the code ...

Make sure that the semantic configuration named "default" is correctly set up in your Azure AI Search service. If it does not exist, you will need to create it or use the existing configuration name as defined in your Azure setup [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

Seems like an issue left from the original contributor. If you have the time, I welcome making a PR to configure this

@edgBR
Copy link
Author

edgBR commented Jan 26, 2025

Hi @logan-markewich thanks for the blazing fast reply.

I would be very willing to contribute but this is my first time looking to llamaindex internals.

Besides the contribution guide: https://docs.llamaindex.ai/en/stable/CONTRIBUTING/#contributing-to-llamaindex

Is there any tips what you could offer regarding how the class structure regarding vector stores should work?

BR
E

@logan-markewich
Copy link
Collaborator

@edgBR nothing too much else specific to mention. I think this should probably be part of the vector store constructor, and then that value is just accessed when doing the hybrid retrieval. Something like self.semantic_config_name ?

To test your code locally, you can clone the repo, and install the package in editable mode
pip install -e llama-index-integrations/vector_stores/llama-index-vector-stores-azureaisearch (I think I go that path name right)

@edgBR
Copy link
Author

edgBR commented Jan 29, 2025

On it @logan-markewich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants