Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some In/Exists Subqueries will generate wrong PhysicalPlan #5265

Closed
ygf11 opened this issue Feb 13, 2023 · 3 comments · Fixed by #13210
Closed

Some In/Exists Subqueries will generate wrong PhysicalPlan #5265

ygf11 opened this issue Feb 13, 2023 · 3 comments · Fixed by #13210
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ygf11
Copy link
Contributor

ygf11 commented Feb 13, 2023

Describe the bug

Datafusion can't execute non-correlated subquery now, like in/exists.
So these queries should return NotImplemented("Physical plan does not support logical expression In/Exists error. But currently the filter will be pushdown to the TableScan.

> explain select * from t1 where exists(select 1 from t2 where t2.t2_id > 0);
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                        |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: t1.t1_id, t1.t1_name, t1.t1_int                                                                                                                 |
|               |   TableScan: t1 projection=[t1_id, t1_name, t1_int], full_filters=[EXISTS (<subquery>)]                                                                     |
|               |     Subquery:                                                                                                                                               |
|               |       Projection: Int64(1)                                                                                                                                  |
|               |         Filter: CAST(t2.t2_id AS Int64) > Int64(0)                                                                                                          |
|               |           TableScan: t2                                                                                                                                     |
| physical_plan | ProjectionExec: expr=[t1_id@0 as t1_id, t1_name@1 as t1_name, t1_int@2 as t1_int]                                                                           |
|               |   CsvExec: files={1 group: [[home/work/tools/datafusion-test-data/join-context/t1.csv]]}, has_header=false, limit=None, projection=[t1_id, t1_name, t1_int] |
|               |                                                                                                                                                             |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

For the above query, the subquery filter will be lost in the final physical_plan.

To Reproduce
As above.

Expected behavior
The query should return NotImplemented("Physical plan does not support logical expression In/Exists error.

Additional context
Add any other context about the problem here.

@ygf11 ygf11 added the bug Something isn't working label Feb 13, 2023
@mingmwang
Copy link
Contributor

I remember the filters will be added to wrong side. I can work on this and provide a fix.

@alamb alamb added the help wanted Extra attention is needed label Oct 18, 2024
@alamb
Copy link
Contributor

alamb commented Oct 18, 2024

Here is a full reproducer:

echo "1,2" > data.csv

The run sql

> create external table t1(t1_id int, t2_name int) stored as csv location 'data.csv';
0 row(s) fetched.
Elapsed 0.019 seconds.

> create external table t2(t2_id int, t2_name int) stored as csv location 'data.csv';
0 row(s) fetched.
Elapsed 0.007 seconds.

> explain select * from t1 where exists(select 1 from t2 where t2.t2_id > 0);
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                            |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | TableScan: t1 projection=[t1_id, t2_name], full_filters=[EXISTS (<subquery>)]                                                                   |
|               |   Subquery:                                                                                                                                     |
|               |     Projection: Int64(1)                                                                                                                        |
|               |       Filter: CAST(t2.t2_id AS Int64) > Int64(0)                                                                                                |
|               |         TableScan: t2                                                                                                                           |
| physical_plan | CsvExec: file_groups={1 group: [[Users/andrewlamb/Software/datafusion2/datafusion-cli/data.csv]]}, projection=[t1_id, t2_name], has_header=true |
|               |                                                                                                                                                 |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.019 seconds.

@Lordworms
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
4 participants