Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make search functions translation aware #118355

Merged
merged 8 commits into from
Dec 13, 2024

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Dec 10, 2024

Following up on #118106 - we break adding semantic search in ES|QL into multiple parts:

  1. All FullTextFunction will own their translation to Lucene queries and FullTextFunction instances will store their query builder.
  2. Introduce the query rewrite phase on the coordinator that would rewrite the initial QueryBuilders for FullTextFunctions and replace the FullTextFunction expression with new ones that store the rewritten query builders in the logical plan.
  3. Finally - when MatchQueryBuilder supports semantic text (should happen very soon Add match support for semantic_text fields #117839) enable the match function to receive semantic_text fields to perform semantic search.

This is just taking care of (1). We introduce a TranslationAware interface and FullTextFunction instances will store their own QueryBuilder which will be serialized.

I wasn't sure which type of tests to add since most of it is a refactor on how we do the FullTextFunction -> query builder translation.

@SuppressWarnings("rawtypes")
@Override
protected ExpressionTranslator translator() {
return new EsqlExpressionTranslators.KqlFunctionTranslator();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could move the translators from EsqlExpressionTranslators but for now I just leave them as they are - I wasn't sure we need to move them. happy to hear otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to move them, but not in this PR - as a follow up.

@ioanatia ioanatia changed the title Make search function translation aware Make search functions translation aware Dec 10, 2024
@SuppressWarnings("rawtypes")
protected abstract ExpressionTranslator translator();

public abstract Expression replaceQueryBuilder(QueryBuilder queryBuilder);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used yet - but this is what we will call for the query rewrite phase - see #118106

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think that this is a good refactoring and improvement that will facilitate better encapsulation and reuse of existing builders. LGTM

@SuppressWarnings("rawtypes")
@Override
protected ExpressionTranslator translator() {
return new EsqlExpressionTranslators.KqlFunctionTranslator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to move them, but not in this PR - as a follow up.

@ioanatia ioanatia marked this pull request as ready for review December 12, 2024 09:24
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 12, 2024
Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ioanatia, I had a first look at the PR an it seems good.
I added a comment about a couple more things we can add to better encapsulate the pushdown logic.

cc. @costin maybe you want to have a look as well, I know you have ideas around these new interfaces.

* {@link Query} translation, instead of relying on the registered translators from EsqlExpressionTranslators.
*/
public interface TranslationAware {
Query asQuery(TranslatorHandler translatorHandler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to centralize in this interface all the pushdown logic, including

  • capabilities, eg. can it be pushed down to Lucene at all? Maybe sometimes it depends on its inputs?
  • constraints, eg. does this function need to be pushed down to be executed? Does it also have an inline implementation that works without pushdown?

The first point is covered by PushFiltersToSource physical rule. I'm not sure if we want ad-hoc logic for each function or if we can simplify it to a single and generic behavior (ie. removing the if (exp instanceof Term). In both cases I'd rather have an exp instanceof TranslationAware there, so that one day we can also let other operators/functions implement it and remove that long list of instanceofs.

For the second point I'm thinking about the Verifier logic around FullTextFunctions. It's a complex topic that involves some deep understanding of how commands interact with each other, I put it here for completeness but probably we can address it in a follow-up PR

Copy link
Contributor Author

@ioanatia ioanatia Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the review!

For the first point the PushFiltersToSource we need to look into it but I think we don't need the condition for Term - we removed a similar one for Match - see https://github.com/elastic/elasticsearch/pull/117555/files#r1858385620

For the second point, agreed we can do it in a follow up - I think it would be a big enough change that we would want to review separately.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ioanatia. It is a good step forward to add TranslationAware. I added a few things that I can think of so far.

return Objects.equals(queryBuilder, ((FullTextFunction) obj).queryBuilder);
}

@SuppressWarnings("rawtypes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a @Override annotation here.

return new TranslationAwareExpressionQuery(source(), queryBuilder);
}

ExpressionTranslator translator = translator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ExpressionTranslator<? extends FullTextFunction> is used here, the @SuppressWarnings("rawtypes") can go away. The same applies to the translator() of the five FullTextFunctions.

public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(Expression.class, "Kql", Kql::new);
public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(Expression.class, "Kql", Kql::readFrom);

public Kql(Source source, Expression queryString, QueryBuilder queryBuilder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be private? And it looks better(to me) if this is after the constructor with FunctionInfo. The same applies to the other three - Match, QueryString and Term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made them private and then the EsqlNodeSubclassTests started failing because AFAICT they expect that the argument number from the longest public constructor to correlate with the number of args from NodeInfo.
This to me seems to be a quirk of the EsqlNodeSubclassTests rather than a test catching a real bug with FullTextFunctions

@ioanatia ioanatia merged commit a765f89 into elastic:main Dec 13, 2024
16 checks passed
ioanatia added a commit to ioanatia/elasticsearch that referenced this pull request Dec 13, 2024
* Introduce TranslationAware interface

* Serialize query builder

* Fix EsqlNodeSubclassTests

* Add javadoc

* Address review comments

* Revert changes on making constructors private
ioanatia added a commit that referenced this pull request Dec 16, 2024
* Introduce TranslationAware interface

* Serialize query builder

* Fix EsqlNodeSubclassTests

* Add javadoc

* Address review comments

* Revert changes on making constructors private

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
)

* Introduce TranslationAware interface

* Serialize query builder

* Fix EsqlNodeSubclassTests

* Add javadoc

* Address review comments

* Revert changes on making constructors private

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants