Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoping of lambda functions in list_apply and list_filter seems artificially restrictive #9981

Closed
1 task done
cpcloud opened this issue Dec 13, 2023 · 6 comments
Closed
1 task done

Comments

@cpcloud
Copy link
Contributor

cpcloud commented Dec 13, 2023

What happens?

Using a parameter name in a lambda functions that matches the column name of the input column results in a strange error message about needing an unqualified parameter name

To Reproduce

D create or replace table t as select [1, 2, 3] as x;
D select list_apply(t.x, x -> x + 1) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified

Since the xs are unrelated to each other here I would expect there to be no issue executing this query.

OS:

Linux (NixOS)

DuckDB Version:

v0.9.2

DuckDB Client:

CLI

Full Name:

Phillip Cloud

Affiliation:

Voltron Data

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@eaubin
Copy link

eaubin commented Dec 14, 2023

I noticed the same problem in the now stale closed issue #5739

@taniabogatsch
Copy link
Contributor

Hi @cpcloud. I am unsure about this. We throw this error because when binding the RHS of the lambda function (x + 1 in your case), it is possible to use other columns of the table, which possibly have the same name as the lambda parameter. The binder error is unrelated to the input list. For example, the following query also triggers the binder error.

create or replace table t as select [1, 2, 3] as x;
select list_apply(['hello'], x -> x) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified

If we allow column names as lambda parameters, the following query would be ambiguous.

create or replace table t as select [[1], [2], [3]] as x;
select list_transform([[1], [2], [3]], x -> x[1]) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified
# possible result: x is a lambda parameter
[1, 2, 3]
# possible result: x is a column name in t
[[1], [1], [1]]

Using the column as the input list is possible, i.e., the following query works.

D create or replace table t as select [1, 2, 3] as x;
D select list_apply(t.x, y -> y + 1) from t;
┌─────────────────────────────────┐
│ list_apply(t.x, (y -> (y + 1))) │
│             int32[]             │
├─────────────────────────────────┤
│ [2, 3, 4]                       │
└─────────────────────────────────┘

I can look into making the error message more clear in this scenario, but I don't see a way to address the possible ambiguity between lambda parameter names and column names in the RHS of the lambda expression.

@cpcloud
Copy link
Contributor Author

cpcloud commented Dec 18, 2023

@taniabogatsch Thanks for looking into it.

I'm not sure I understand the ambiguity here, as long as there are well-defined scoping rules for lambdas.

create or replace table t as select [[1], [2], [3]] as x;
select list_transform([[1], [2], [3]], x -> x[1]) from t;
Error: Binder Error: Invalid parameter name 't.x': must be unqualified

Why would x be ambiguous here? It seems to me that this is unambiguously referring to the x argument of the lambda function and couldn't possibly ever be t.x without the table identifier qualifying it. I would expect lexical scoping here:

  1. x is first looked up as a local variable
  2. failing 1, x is looked up as a column
  3. failing 2, an error message is raised about not being able to resolve x

Other systems that implement equivalent functionality such as ClickHouse and Trino do not have this problem.

ClickHouse:

❯ clickhouse local
ClickHouse local version 23.11.1.1.

localhost :) create view t as select [1, 2, 3] x;

CREATE VIEW t AS
SELECT [1, 2, 3] AS x

Query id: 8c03f649-7cc9-4619-b141-765b9ab17ef7

Ok.

0 rows in set. Elapsed: 0.000 sec.

localhost :) select arrayMap(x -> x + 1, x) from t

SELECT arrayMap(x -> (x + 1), x)
FROM t

Query id: ca1b3c99-f996-480b-b445-98777383a5ef

┌─arrayMap(lambda(tuple(x), plus(x, 1)), x)─┐
│ [2,3,4]                                   │
└───────────────────────────────────────────┘

1 row in set. Elapsed: 0.001 sec.

Trino:

❯ trino --catalog memory --schema default
trino:default> create view t as select array[1, 2, 3] as x;
CREATE VIEW
trino:default> select transform(x, x -> x + 1) from t;
   _col0
-----------
 [2, 3, 4]
(1 row)

Query 20231218_115821_00236_x3bsy, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.10 [0 rows, 0B] [0 rows/s, 0B/s]

@taniabogatsch
Copy link
Contributor

We bind lambda functions with our SQL scalar function binder, which does not have a concept of local variables that are not columns. I.e., the binder treats lambda parameters the same as it treats columns. That's also why we recently pushed this PR...

and couldn't possibly ever be t.x without the table identifier qualifying it

SQL scalar functions can reference column names without table identifiers, which should not change for lambda functions.

But after looking into this more now, I agree that adding scoping here for lambda functions, which are already a particular case, can make sense and make the lambda functions more intuitive to use. I should be able to change the lambda binding to prioritize lambda parameter names over column names and inner lambda parameter names over outer lambda parameter names, like so: inner lambda > outer lambda > column.

@taniabogatsch
Copy link
Contributor

PR got merged.

@cpcloud
Copy link
Contributor Author

cpcloud commented Feb 8, 2024

Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants