Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: push down LIMIT past LOOKUP JOIN #118495

Merged
merged 11 commits into from
Dec 13, 2024

Conversation

alex-spies
Copy link
Contributor

Fix #117698 by enabling push down of LIMIT past LEFT JOINs.

There is a subtle point here: our LOOKUP JOIN currently exactly preserves the number of rows from the left hand side. This is different from SQL, where LEFT JOIN will return at least one row for each row from the left, but may return multiple rows in case of multiple matches. We, instead, throw multiple matches into multi-values, instead. (C.f. tests that I'm about to add that demonstrate this.)

If we were to change our semantics to match SQL's, we'd have to adjust the pushdown, too.

@alex-spies alex-spies added >non-issue auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Dec 11, 2024
@alex-spies alex-spies requested review from costin and bpintea December 11, 2024 19:10
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

public void testPushdownLimitsPastLeftJoin() {
var leftChild = emptySource();
var rightChild = new LocalRelation(Source.EMPTY, List.of(fieldAttribute()), LocalSupplier.EMPTY);
assertNotEquals(leftChild, rightChild);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta be very careful here, you never know.

@@ -63,8 +62,10 @@ public LogicalPlan rule(Limit limit) {
}
}
} else if (limit.child() instanceof Join join) {
if (join.config().type() == JoinTypes.LEFT && join.right() instanceof LocalRelation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why the previous restriction to LocalRelation (even if that's what was expected before). Anyways, should be good this way now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That came from when we only had LOOKUP (aka 🐔 ) and INLINESTATS to consider. Both of these ended up with a LocalRelation on the right, and we didn't want to implement this in all generality because the semantics were not 100% clear.

@alex-spies alex-spies added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 12, 2024
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticsearchmachine elasticsearchmachine merged commit 4ff5acc into elastic:main Dec 13, 2024
16 checks passed
@alex-spies alex-spies deleted the join-limit-pushdown branch December 13, 2024 09:52
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Dec 13, 2024
Fix elastic#117698 by enabling
push down of `LIMIT` past `LEFT JOIN`s.

There is a subtle point here: our `LOOKUP JOIN` currently _exactly
preserves the number of rows from the left hand side_. This is different
from SQL, where `LEFT JOIN` will return _at least one row for each row
from the left_, but may return multiple rows in case of multiple
matches. We, instead, throw multiple matches into multi-values, instead.
(C.f. [tests that I'm about to
add](https://github.com/elastic/elasticsearch/pull/118471/files#diff-334f3328c5f066a093ed8a5ea4a62cd6bcdb304b660b15763bb4f64d0e87ed7cR365-R369)
that demonstrate this.)

If we were to change our semantics to match SQL's, we'd have to adjust
the pushdown, too.
elasticsearchmachine pushed a commit that referenced this pull request Dec 13, 2024
Fix #117698 by enabling
push down of `LIMIT` past `LEFT JOIN`s.

There is a subtle point here: our `LOOKUP JOIN` currently _exactly
preserves the number of rows from the left hand side_. This is different
from SQL, where `LEFT JOIN` will return _at least one row for each row
from the left_, but may return multiple rows in case of multiple
matches. We, instead, throw multiple matches into multi-values, instead.
(C.f. [tests that I'm about to
add](https://github.com/elastic/elasticsearch/pull/118471/files#diff-334f3328c5f066a093ed8a5ea4a62cd6bcdb304b660b15763bb4f64d0e87ed7cR365-R369)
that demonstrate this.)

If we were to change our semantics to match SQL's, we'd have to adjust
the pushdown, too.
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
Fix elastic#117698 by enabling
push down of `LIMIT` past `LEFT JOIN`s.

There is a subtle point here: our `LOOKUP JOIN` currently _exactly
preserves the number of rows from the left hand side_. This is different
from SQL, where `LEFT JOIN` will return _at least one row for each row
from the left_, but may return multiple rows in case of multiple
matches. We, instead, throw multiple matches into multi-values, instead.
(C.f. [tests that I'm about to
add](https://github.com/elastic/elasticsearch/pull/118471/files#diff-334f3328c5f066a093ed8a5ea4a62cd6bcdb304b660b15763bb4f64d0e87ed7cR365-R369)
that demonstrate this.)

If we were to change our semantics to match SQL's, we'd have to adjust
the pushdown, too.
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
Fix elastic#117698 by enabling
push down of `LIMIT` past `LEFT JOIN`s.

There is a subtle point here: our `LOOKUP JOIN` currently _exactly
preserves the number of rows from the left hand side_. This is different
from SQL, where `LEFT JOIN` will return _at least one row for each row
from the left_, but may return multiple rows in case of multiple
matches. We, instead, throw multiple matches into multi-values, instead.
(C.f. [tests that I'm about to
add](https://github.com/elastic/elasticsearch/pull/118471/files#diff-334f3328c5f066a093ed8a5ea4a62cd6bcdb304b660b15763bb4f64d0e87ed7cR365-R369)
that demonstrate this.)

If we were to change our semantics to match SQL's, we'd have to adjust
the pushdown, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: LookupJoin fails to push down limits
6 participants