Invalid aggregate SQL query with `HAVING` can be executed without error (SQLancer-TLP) #12013

2010YOUY01 · 2024-08-15T10:38:07Z

Describe the bug

The following 2 queries can not be executed in DuckDB/Postgres, but can be executed without error in DataFusion.

See reproducer in datafusion-cli

DataFusion CLI v41.0.0
> create table t1(v1 int);
0 row(s) fetched.
Elapsed 0.069 seconds.

> insert into t1 values (1),(2),(3);
+-------+
| count |
+-------+
| 3     |
+-------+
1 row(s) fetched.
Elapsed 0.049 seconds.

> select * from t1 having max(v1)=3;
+------------+
| max(t1.v1) |
+------------+
| 3          |
+------------+
1 row(s) fetched.
Elapsed 0.037 seconds.

> select * from t1 having max(v1)=1;
+------------+
| max(t1.v1) |
+------------+
+------------+
0 row(s) fetched.
Elapsed 0.006 seconds.

Looks like * is expanded to the aggregate function expression i.e. max(v1) inside HAVING clause.
I think such semantics is not meaningful so report it as a potential bug.

To Reproduce

No response

Expected behavior

No response

Additional context

Found by SQLancer #11030

The text was updated successfully, but these errors were encountered:

goldmedal · 2024-08-17T05:22:38Z

take

jayzhan211 · 2024-08-17T09:41:19Z

I guess this issue is similar to #11897 #11748

goldmedal · 2024-08-17T16:12:26Z

I guess this issue is similar to #11897 #11748

I didn't dig into this issue but may be related 🤔

After some research, I believe this is a bug of #11681. When planning an aggregation, we perform some validations at

datafusion/datafusion/sql/src/select.rs

Lines 752 to 770 in cb1e3f0

    
           // finally, we have some validation that the re-written projection can be resolved 
        
           // from the aggregate output columns 
        
           check_columns_satisfy_exprs( 
        
               &column_exprs_post_aggr, 
        
               &select_exprs_post_aggr, 
        
               "Projection references non-aggregate values", 
        
           )?; 
        
           // Rewrite the HAVING expression to use the columns produced by the 
        
           // aggregation. 
        
           let having_expr_post_aggr = if let Some(having_expr) = having_expr_opt { 
        
               let having_expr_post_aggr = 
        
                   rebase_expr(having_expr, &aggr_projection_exprs, input)?; 
        
               check_columns_satisfy_exprs( 
        
                   &column_exprs_post_aggr, 
        
                   &[having_expr_post_aggr.clone()], 
        
                   "HAVING clause references non-aggregate values", 
        
               )?;

However, after moving the wildcard expansion to the analyzer, we can no longer determine the projection expressions at this point. This causes the validations to fail, resulting in an invalid plan. Essentially, it causes another bug.

Consider the following case:

create table t1(v1 int, v2 int);
insert into t1 values (1,2),(2,4),(3,6);
select * from t1 group by v1, v2 having max(v1)=3;
----
+----+----+------------+
| v1 | v2 | max(t1.v1) |
+----+----+------------+
| 3  | 6  | 3          |
+----+----+------------+

The aggregation field for the having clause shouldn't be printed. This issue relates to how wildcards are expanded based on a filter plan.

I attempted to resolve this issue roughly but encountered another problem with error messages. Consider the SQL provided by @2010YOUY01:

select * from t1 having max(v1)=3;
---
Error: expand_wildcard_rule
caused by
Schema error: No field named t1.v1. Valid fields are "max(t1.v1)".

In the previous version (41.0.0), the message would have been:

> select * from t1 having max(v1)=3;
Error during planning: Projection references non-aggregate values: Expression t1.v1 could not be resolved from available columns: max(t1.v1)

I think we can expand the wildcard when planning the aggregation for validation if the group-by keys are empty. I can ensure that if the group-by keys are empty, the SQL isn't valid for the having clause, resulting in a better error message.

However, if the group-by keys aren't empty and we don't expand the wildcard, it becomes difficult to provide a correct error message. The SQL will fail when invoking ExpandWildCardRule. For example:

create table t1(v1 int, v2 int);
insert into t1 values (1, 2),(2,3),(3,4);
select * from t1 group by v1 having max(v1)=3
----
Error: expand_wildcard_rule
caused by
Schema error: No field named t1.v2. Valid fields are t1.v1, "max(t1.v1)".

In the previous version (41.0.0), the error message would have been:

> select * from t2 group by v1 having max(v2) = 2;
Error during planning: Projection references non-aggregate values: Expression t2.v2 could not be resolved from available columns: t2.v1, max(t2.v2)

To avoid duplicate expansion, I don't prefer expanding the wildcard when the group-by keys aren't empty. However, this results in an unclear error message for the user 🤔.

I'll draft a PR to explain this more clearly.

cc @jayzhan211

2010YOUY01 added the bug Something isn't working label Aug 15, 2024

github-actions bot assigned goldmedal Aug 17, 2024

goldmedal mentioned this issue Aug 18, 2024

Fix wildcard expansion for HAVING clause #12046

Merged

jayzhan211 closed this as completed in #12046 Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid aggregate SQL query with `HAVING` can be executed without error (SQLancer-TLP) #12013

Invalid aggregate SQL query with `HAVING` can be executed without error (SQLancer-TLP) #12013

2010YOUY01 commented Aug 15, 2024

goldmedal commented Aug 17, 2024

jayzhan211 commented Aug 17, 2024

goldmedal commented Aug 17, 2024

Invalid aggregate SQL query with HAVING can be executed without error (SQLancer-TLP) #12013

Invalid aggregate SQL query with HAVING can be executed without error (SQLancer-TLP) #12013

Comments

2010YOUY01 commented Aug 15, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

goldmedal commented Aug 17, 2024

jayzhan211 commented Aug 17, 2024

goldmedal commented Aug 17, 2024

Invalid aggregate SQL query with `HAVING` can be executed without error (SQLancer-TLP) #12013

Invalid aggregate SQL query with `HAVING` can be executed without error (SQLancer-TLP) #12013