-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(presto): Handle ROW data stored as string #10456
Conversation
@@ -653,12 +656,15 @@ def expand_data( # pylint: disable=too-many-locals | |||
# expand columns; we append them to the left so they are added | |||
# immediately after the parent | |||
expanded = get_children(column) | |||
to_process.extendleft((column, level) for column in expanded) | |||
to_process.extendleft((column, level) for column in expanded[::-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reversed expanded
so that sub-columns are added in the right order, otherwise we get:
a ROW(b, c) => a, a.c, a.b
Codecov Report
@@ Coverage Diff @@
## master #10456 +/- ##
==========================================
+ Coverage 70.60% 72.04% +1.44%
==========================================
Files 601 620 +19
Lines 32329 36465 +4136
Branches 3275 3695 +420
==========================================
+ Hits 22826 26272 +3446
- Misses 9397 10075 +678
- Partials 106 118 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
* Handle ROW data stored as string * Use destringify * Fix mypy * Fix mypy with cast * Bypass pylint
fix(presto): Handle ROW data stored as string (apache#10456)
* Handle ROW data stored as string * Use destringify * Fix mypy * Fix mypy with cast * Bypass pylint
[VIZ-1979] [Backporting] fix(presto): Handle ROW data stored as string (apache#10456)
* Handle ROW data stored as string * Use destringify * Fix mypy * Fix mypy with cast * Bypass pylint
* Handle ROW data stored as string * Use destringify * Fix mypy * Fix mypy with cast * Bypass pylint
SUMMARY
When serializing nested types we convert them to string, which breaks the
expand_data
in Presto.I added a check in
expand_data
so that when the data associated with a nested column (typesARRAY
andROW
) is a string, it gets deserialized back into the original data.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
A Presto query selecting an
ARRAY
and aROW
without thePRESTO_EXPAND_DATA
feature flag:With the feature enabled, data should be displayed similar to how BigQuery does, with arrays expanded into multiple lines, and rows expanded into multiple columns. This is currently broken:
This PR fixes the problem, and the data gets expanded correctly:
TEST PLAN
Tested with a simple query (see above), and added a unit test covering the problem.
ADDITIONAL INFORMATION