Prototype: Semantic search in ES|QL with query rewrite #118106
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A prototype for an alternative to #116253
This allows using
match
onsemantic_text
fields, but without the need to reimplement the logic for getting the inference results in ES|QL.This leverages the same model of query rewrite phase we already have in _search on the coordinator.
I added a
ChickenQueryBuilder
to show that semantic search actually works, but what we actually want is to continue to useMatchQueryBuilder
when thematch
query starts supportingsemantic_text
(which is in progress).It also shows the two types of query rewrites that happen under the hood for semantic text, but those would not be directly exposed in ES|QL.
The nice thing about this approach is that we can reuse it when we add a
knn
function and we want to do something similar to thequery_vector_builder
:Here the argument of
knn
is not a vector, but a text query with a specifiedmodel_id
to transform the query text into a vector. We could use the same approach of having aQueryBuilder
that gets a rewrite phase on the coordinator to get the embeddings. Then at least for knn, we do not have a requirement to have the concept of async functions in ES|QL.While this PR is cutting a lot of corners I want to get some high level feedback on a few questions:
EsqlSession
- is the a better place for it?FullTextFunction
would store its ownQueryBuilder
that would get rewritten on the coordinator. I guess we want everyFullTextFunction
to control the function -> query builder translation, but storing theQueryBuilder
in theFullTextFunction
takes this idea a bit further than some might expectIf we agree with this high level approach, the plan would be to split this into multiple phases:
FullTextFunction
will own their translation to Lucene queries andFullTextFunction
instances will store their query builder.QueryBuilder
s forFullTextFunction
s and rewrite theFullTextFunction
nodes with rewrittenQueryBuilder
s.MatchQueryBuilder
supports semantic text, enable thematch
function to receivesemantic_text
fields to perform semantic search.