Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype: Semantic search in ES|QL with query rewrite #118106

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Dec 5, 2024

A prototype for an alternative to #116253

This allows using match on semantic_text fields, but without the need to reimplement the logic for getting the inference results in ES|QL.
This leverages the same model of query rewrite phase we already have in _search on the coordinator.

I added a ChickenQueryBuilder to show that semantic search actually works, but what we actually want is to continue to use MatchQueryBuilder when the match query starts supporting semantic_text (which is in progress).
It also shows the two types of query rewrites that happen under the hood for semantic text, but those would not be directly exposed in ES|QL.

The nice thing about this approach is that we can reuse it when we add a knn function and we want to do something similar to the query_vector_builder:

{
  "knn": {
    "field": "dense-vector-field",
    "k": 10,
    "num_candidates": 100,
    "query_vector_builder": {
      "text_embedding": { 
        "model_id": "my-text-embedding-model",
        "model_text": "The opposite of blue" 
      }
    }
  }
}

Here the argument of knn is not a vector, but a text query with a specified model_id to transform the query text into a vector. We could use the same approach of having a QueryBuilder that gets a rewrite phase on the coordinator to get the embeddings. Then at least for knn, we do not have a requirement to have the concept of async functions in ES|QL.

While this PR is cutting a lot of corners I want to get some high level feedback on a few questions:

  • do we agree that having a query builder rewrite phase is a better approach than semantic search in ES|QL #116253?
  • I have put the query rewrite phase after the plan is analyzed (and optimized) in EsqlSession - is the a better place for it?
  • with this approach every FullTextFunction would store its own QueryBuilder that would get rewritten on the coordinator. I guess we want every FullTextFunction to control the function -> query builder translation, but storing the QueryBuilder in the FullTextFunction takes this idea a bit further than some might expect

If we agree with this high level approach, the plan would be to split this into multiple phases:

  1. All FullTextFunction will own their translation to Lucene queries and FullTextFunction instances will store their query builder.
  2. Introduce the query rewrite phase on the coordinator that would rewrite the initial QueryBuilders for FullTextFunctions and rewrite the FullTextFunction nodes with rewritten QueryBuilders.
  3. Finally - when MatchQueryBuilder supports semantic text, enable the match function to receive semantic_text fields to perform semantic search.

@carlosdelest
Copy link
Member

This LGTM. Having support for a coordinator rewrite phase will simplify the work for inference related tasks, and allow to reuse the work already being done for semantic_text related queries.

@ioanatia ioanatia changed the title Semantic search in ES|QL with query rewrite Prototype: Semantic search in ES|QL with query rewrite Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants