[Bug]: Azure AI Search Hybrid Semantic Search is unusable due to hardcoded parameter #17636

edgBR · 2025-01-26T16:00:52Z

Bug Description

Hi all,

I noticed that all my semantic hybrid queries fail when defining an index as retrieves. My index definition is as follows:

credentials = AzureCliCredential()
cognitive_services_specific_ad_token = get_bearer_token_provider(  
    credentials,  
    "https://cognitiveservices.azure.com/.default"  
)  

semantic_search_config = SemanticSearch(configurations=[SemanticConfiguration(
            name="default",
            prioritized_fields=SemanticPrioritizedFields(
                content_fields=[SemanticField(field_name="content")],
            ),
        )]
                                )
# Initialize search clients
index_client = SearchIndexClient(endpoint=SEARCH_SERVICE_ENDPOINT, credential=credentials, semantic_search=semantic_search_config)
vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    index_name=INDEX_NAME,
    index_management=IndexManagement.VALIDATE_INDEX,
    id_field_key="id",
    chunk_field_key="content",
    embedding_field_key="content_vector",
    embedding_dimensionality=3072,
    metadata_string_field_key="metadata",
    doc_id_field_key="title",
    language_analyzer="en.lucene",
    vector_algorithm_type="hnsw",
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    [],
    storage_context=storage_context,
)

When calling:

from tqdm import tqdm
import json
from openinference.instrumentation import using_metadata
from phoenix.trace import using_project

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from llama_index.core import get_response_synthesizer
import pprint

# define response synthesizer
response_synthesizer = get_response_synthesizer()

semantic_hybrid_retriever = index.as_retriever(
    vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID, similarity_top_k=5
)
semantic_hybrid_query_engine = RetrieverQueryEngine(
    retriever=semantic_hybrid_retriever, response_synthesizer=response_synthesizer
)

# Load all evaluation questions from queries.jsonl
eval_questions = []
with open("qa.jsonl", "r") as file:
    for line in file:
        # Parse each line as JSON and extract the query
        json_line = json.loads(line.strip())
        eval_questions.append(json_line)

# List of query engines and their respective project names
query_engines = [
    #(keyword_query_engine, "Keyword"),
    #(hybrid_query_engine, "Hybrid"),
    (semantic_hybrid_query_engine, "Semantic_Hybrid"),
]

# Loop through each question and query it against each engine
for query_data in tqdm(eval_questions):
    query = query_data["question"]
    query_classification = query_data.get("query_classification", "undefined")  # Default to 'undefined' if not present

    for engine, project_name in query_engines:
        try:
            metadata = query_classification
            with using_project(project_name), using_metadata(metadata):
                # Assuming the query method expects a string query and returns results
                engine.query(query)
        except Exception as e:
            print(f"Error querying {project_name} for query '{query}': {e}")

I get errors for all the queries.

Looking to the internal code of llamaindex I have not been able to see a way to pass the semantic config parameters via kwargs:

Version

0.12.9

Steps to Reproduce

Follow this notebooks:

https://github.com/farzad528/azure-ai-search-python-playground/blob/main/azure-ai-search-rag-eval-arize-ai.ipynb

And adapt it to your Azure Endpoints. Besides your semantic config needs to be called something different from: "mySemanticConfig"

In my case is default:

Relevant Logs/Tracbacks

dosubot · 2025-01-26T16:01:59Z

Hey @edgBR! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together!

To resolve the issue with semantic hybrid queries failing due to a hardcoded parameter in the LlamaIndex code, you need to ensure that the semantic configuration name is correctly set to "default" in your code. The error occurs because the semantic configuration name might be set to "mySemanticConfig" in the LlamaIndex code, which doesn't match your setup.

Here's how you can update the code to use the "default" semantic configuration:

Locate the _create_query_result and _acreate_query_result methods in the AzureQueryResultSearchSemanticHybrid class within the LlamaIndex code.
Update the semantic_configuration_name parameter to "default" in both methods:

def _create_query_result(
    self, search_query: str, vectors: Optional[List[Any]]
) -> VectorStoreQueryResult:
    results = self._search_client.search(
        search_text=search_query,
        vector_queries=vectors,
        top=self._query.similarity_top_k,
        select=self._select_fields,
        filter=self._odata_filter,
        query_type="semantic",
        semantic_configuration_name="default",  # Change to "default"
    )
    # ... rest of the code ...

async def _acreate_query_result(
    self, search_query: str, vectors: Optional[List[Any]]
) -> VectorStoreQueryResult:
    results = await self._async_search_client.search(
        search_text=search_query,
        vector_queries=vectors,
        top=self._query.similarity_top_k,
        select=self._select_fields,
        filter=self._odata_filter,
        query_type="semantic",
        semantic_configuration_name="default",  # Change to "default"
    )
    # ... rest of the code ...

Make sure that the semantic configuration named "default" is correctly set up in your Azure AI Search service. If it does not exist, you will need to create it or use the existing configuration name as defined in your Azure setup ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

logan-markewich · 2025-01-26T16:09:50Z

Seems like an issue left from the original contributor. If you have the time, I welcome making a PR to configure this

edgBR · 2025-01-26T16:33:01Z

Hi @logan-markewich thanks for the blazing fast reply.

I would be very willing to contribute but this is my first time looking to llamaindex internals.

Besides the contribution guide: https://docs.llamaindex.ai/en/stable/CONTRIBUTING/#contributing-to-llamaindex

Is there any tips what you could offer regarding how the class structure regarding vector stores should work?

BR
E

logan-markewich · 2025-01-26T21:57:34Z

@edgBR nothing too much else specific to mention. I think this should probably be part of the vector store constructor, and then that value is just accessed when doing the hybrid retrieval. Something like self.semantic_config_name ?

To test your code locally, you can clone the repo, and install the package in editable mode
pip install -e llama-index-integrations/vector_stores/llama-index-vector-stores-azureaisearch (I think I go that path name right)

edgBR · 2025-01-29T12:32:02Z

On it @logan-markewich

edgBR added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jan 26, 2025

edgBR mentioned this issue Jan 26, 2025

Semantic Hybrid Retriever Code will not work if semantic config name is not called mySemanticConfig farzad528/azure-ai-search-python-playground#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Azure AI Search Hybrid Semantic Search is unusable due to hardcoded parameter #17636

[Bug]: Azure AI Search Hybrid Semantic Search is unusable due to hardcoded parameter #17636

edgBR commented Jan 26, 2025 •

edited

Loading

dosubot bot commented Jan 26, 2025

logan-markewich commented Jan 26, 2025

edgBR commented Jan 26, 2025

logan-markewich commented Jan 26, 2025

edgBR commented Jan 29, 2025 •

edited

Loading

[Bug]: Azure AI Search Hybrid Semantic Search is unusable due to hardcoded parameter #17636

[Bug]: Azure AI Search Hybrid Semantic Search is unusable due to hardcoded parameter #17636

Comments

edgBR commented Jan 26, 2025 • edited Loading

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Jan 26, 2025

logan-markewich commented Jan 26, 2025

edgBR commented Jan 26, 2025

logan-markewich commented Jan 26, 2025

edgBR commented Jan 29, 2025 • edited Loading

edgBR commented Jan 26, 2025 •

edited

Loading

edgBR commented Jan 29, 2025 •

edited

Loading