Managing Nested Fields #16959

findepi · 2023-04-11T12:01:40Z

Discussed in #16897

^{Originally posted by ebyhr April 6, 2023}

Overview

We need a way to add, drop, rename and change the types of nested columns via SQL in Trino. Most connectors do not support nested data, but for connectors like Iceberg that do, the inability to modify nested data types makes the feature effectively unusable.
Proposed Changes

Grammar

In general ALTER TABLE commands should use qualifiedName, a dotted path, to refer to a column instead of just an identifier.
We considered the following syntax for renaming a field.

ALTER TABLE ... RENAME COLUMN a.b.c TO d
ALTER TABLE ... ALTER COLUMN a RENAME FIELD b.c TO b.d
ALTER TABLE ... RENAME COLUMN a.b.c TO a.b.d

1 is a straightforward and natural extension of the existing syntax.
2 makes us reuse code easily (especially around *ColumnTask class), but such option (RENAME FIELD) doesn't exist in SQL standard.
3 allows moving a field to a different layer like a.b.c → a.c, but we assume existing file formats don't support such movement.

In conclusion, we're going to adopt 1. Other syntax (ADD COLUMN, DROP COLUMN & SET DATA TYPE) will support this syntax in the same way.

The ADD COLUMN command will be a problem because it shares columnDefinition grammar with CREATE TABLE which does not need dotted paths. We could avoid this by separating syntax in SqlBase.g4

    | ALTER TABLE (IF EXISTS)? tableName=qualifiedName
        ADD COLUMN (IF NOT EXISTS)? column=columnDefinition            #addColumn
    | ALTER TABLE (IF EXISTS)? tableName=qualifiedName
        ADD COLUMN (IF NOT EXISTS)? columnName=qualifiedName type      #addField

Column Path

When altering columns the nested target will be selected using a sequence of identifiers separated by dots using the following rules:

ROW: has a nested fields as normal
MAP: has synthetic key and value fields
ARRAY: has a synthetic element field

For example, if we have column my_col of type ARRAY(ROW(my_map MAP(VARCHAR, ROW(x BIGINT, y BIGINT)))), the nested y field can be target with my_col.element.my_map.value.y.

Security

The checks for column add, drop, rename, and change type will need to be updated for field path.

Follow-up Work

Multi-part Alter Table
We should also consider adding multi part alter table commands, so multiple columns can be added, removed, or renamed in one statement. This is important for complex atomic transformations of tables, and is supported by Iceberg.

The syntax has been discussed with @martint @electrum @dain @erichwang @findepi @kasiafi. Thanks for writing the base documentation.

The text was updated successfully, but these errors were encountered:

This change introduces behavior incompatible with trinodb#16959

This change introduces behavior incompatible with #16959

ebyhr mentioned this issue Apr 14, 2023

Support adding a field with ADD COLUMN in Iceberg #16321

Merged

ebyhr mentioned this issue Jul 25, 2023

Support setting a field type with SET DATA TYPE statement in engine and Iceberg #18395

Merged

ebyhr closed this as completed in #18395 Aug 2, 2023

This was referenced Aug 3, 2023

Add Trino 423 release notes #18496

Merged

Add documentation for nested fields #18546

Open

martint added a commit to martint/trino that referenced this issue May 31, 2024

Revert "Allow changing field type in records wrapped in an array"

101a807

This change introduces behavior incompatible with trinodb#16959

martint added a commit to martint/trino that referenced this issue May 31, 2024

Revert "Allow adding and droping fields to records wrapped in array"

157b81c

This change introduces behavior incompatible with trinodb#16959

martint mentioned this issue May 31, 2024

Revert changes to allow updating fields within nested arrays #22219

Merged

martint added a commit that referenced this issue May 31, 2024

Revert "Allow changing field type in records wrapped in an array"

7239574

This change introduces behavior incompatible with #16959

martint added a commit that referenced this issue May 31, 2024

Revert "Allow adding and droping fields to records wrapped in array"

2ffc951

This change introduces behavior incompatible with #16959

losipiuk mentioned this issue Jun 2, 2024

Allow adding and dropping fields to records wrapped in array #22232

Merged

ebyhr mentioned this issue Nov 25, 2024

Allow changing field type in records wrapped in a map #24248

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing Nested Fields #16959

Managing Nested Fields #16959

findepi commented Apr 11, 2023 •

edited by ebyhr

Loading

Overview

Grammar

Column Path

Security

Follow-up Work

Managing Nested Fields #16959

Managing Nested Fields #16959

Comments

findepi commented Apr 11, 2023 • edited by ebyhr Loading

Discussed in #16897

Overview

Grammar

Column Path

Security

Follow-up Work

findepi commented Apr 11, 2023 •

edited by ebyhr

Loading