-
Notifications
You must be signed in to change notification settings - Fork 609
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(flink): fix compilation of memtable with nested data (#8751)
## Description of changes This PR aims to fix the compilation of memtables with nested data. ### What was broken In particular, [Flink does not support the ``STRUCT(1 AS `a`)`` aliasing syntax to define named STRUCTs](https://issues.apache.org/jira/browse/FLINK-9161). In order to do so, we must use a workaround using `CAST`, e.g., ```sql SELECT CAST(('a', 1) as ROW<a STRING, b INT>); ``` However, Flink also does not allow you to directly construct ARRAYs of named STRUCTs using the `ARRAY[]` constructor. This is a bug that I identified and I have filed it with the Flink community (JIRA ticket ref: https://issues.apache.org/jira/browse/FLINK-34898). For the time being, we will need to use another `CAST` workaound that casts the entire nested array, e.g., ```sql SELECT cast(ARRAY[ROW(1)] as ARRAY<ROW<a INT>>); -- instead of ARRAY[CAST(ROW(1) AS ROW<a INT>)] ``` ### How to fix To summarize, - if it’s an array of named structs `CAST(ARRAY[] AS ARRAY<ROW<>, ROW<>>)` - if it’s named structs `CAST(ROW() AS ROW<datatype of each field>)` - if it’s unnamed structs (but I'm not sure how to write this in Ibis) `ROW()` I thought of two approaches to this: 1. Rewrite the operator mapping in the Flink backend (i.e., change the `visit_NonNullLiteral()` method) 2. Rewrite the translation rule in Flink's `Generator` I found both implementations in different scenarios and decided to go with option (2). ## Issues closed #8516
- Loading branch information
Showing
6 changed files
with
112 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
from __future__ import annotations | ||
|
||
import pytest | ||
from pyflink.common.types import Row | ||
|
||
import ibis | ||
from ibis.backends.tests.errors import Py4JJavaError | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"data,schema,expected", | ||
[ | ||
pytest.param( | ||
{"value": [{"a": 1}, {"a": 2}]}, | ||
{"value": "!struct<a: !int>"}, | ||
[Row(Row([1])), Row(Row([2]))], | ||
id="simple_named_struct", | ||
), | ||
pytest.param( | ||
{"value": [[{"a": 1}, {"a": 2}], [{"a": 3}, {"a": 4}]]}, | ||
{"value": "!array<!struct<a: !int>>"}, | ||
[Row([Row([1]), Row([2])]), Row([Row([3]), Row([4])])], | ||
id="single_field_named_struct_array", | ||
), | ||
pytest.param( | ||
{"value": [[{"a": 1, "b": 2}, {"a": 2, "b": 2}]]}, | ||
{"value": "!array<!struct<a: !int, b: !int>>"}, | ||
[Row([Row([1, 2]), Row([2, 2])])], | ||
id="named_struct_array", | ||
), | ||
], | ||
) | ||
def test_create_memtable(con, data, schema, expected): | ||
t = ibis.memtable(data, schema=ibis.schema(schema)) | ||
# cannot use con.execute(t) directly because of some behavioral discrepancy between | ||
# `TableEnvironment.execute_sql()` and `TableEnvironment.sql_query()` | ||
result = con.raw_sql(con.compile(t)) | ||
# raw_sql() returns a `TableResult` object and doesn't natively convert to pandas | ||
assert list(result.collect()) == expected | ||
|
||
|
||
@pytest.mark.notyet( | ||
["flink"], | ||
raises=Py4JJavaError, | ||
reason="cannot create an ARRAY of named STRUCTs directly from the ARRAY[] constructor; https://issues.apache.org/jira/browse/FLINK-34898", | ||
) | ||
def test_create_named_struct_array_with_array_constructor(con): | ||
con.raw_sql("SELECT ARRAY[cast(ROW(1) as ROW<a INT>)];") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters