-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing Nested Fields #16959
Comments
This was referenced Aug 3, 2023
martint
added a commit
to martint/trino
that referenced
this issue
May 31, 2024
This change introduces behavior incompatible with trinodb#16959
martint
added a commit
to martint/trino
that referenced
this issue
May 31, 2024
This change introduces behavior incompatible with trinodb#16959
martint
added a commit
that referenced
this issue
May 31, 2024
This change introduces behavior incompatible with #16959
martint
added a commit
that referenced
this issue
May 31, 2024
This change introduces behavior incompatible with #16959
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Discussed in #16897
Originally posted by ebyhr April 6, 2023
Overview
We need a way to add, drop, rename and change the types of nested columns via SQL in Trino. Most connectors do not support nested data, but for connectors like Iceberg that do, the inability to modify nested data types makes the feature effectively unusable.
Proposed Changes
Grammar
In general ALTER TABLE commands should use qualifiedName, a dotted path, to refer to a column instead of just an identifier.
We considered the following syntax for renaming a field.
1 is a straightforward and natural extension of the existing syntax.
2 makes us reuse code easily (especially around *ColumnTask class), but such option (RENAME FIELD) doesn't exist in SQL standard.
3 allows moving a field to a different layer like a.b.c → a.c, but we assume existing file formats don't support such movement.
In conclusion, we're going to adopt 1. Other syntax (ADD COLUMN, DROP COLUMN & SET DATA TYPE) will support this syntax in the same way.
The ADD COLUMN command will be a problem because it shares columnDefinition grammar with CREATE TABLE which does not need dotted paths. We could avoid this by separating syntax in
SqlBase.g4
Column Path
When altering columns the nested target will be selected using a sequence of identifiers separated by dots using the following rules:
key
andvalue
fieldselement
fieldFor example, if we have column my_col of type ARRAY(ROW(my_map MAP(VARCHAR, ROW(x BIGINT, y BIGINT)))), the nested y field can be target with my_col.element.my_map.value.y.
Security
The checks for column add, drop, rename, and change type will need to be updated for field path.
Follow-up Work
Multi-part Alter Table
We should also consider adding multi part alter table commands, so multiple columns can be added, removed, or renamed in one statement. This is important for complex atomic transformations of tables, and is supported by Iceberg.
The syntax has been discussed with @martint @electrum @dain @erichwang @findepi @kasiafi. Thanks for writing the base documentation.
The text was updated successfully, but these errors were encountered: