Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra_properties to Delta Lake table properties #17595

Closed

Conversation

jkylling
Copy link
Contributor

@jkylling jkylling commented May 22, 2023

Description

Similar to #17172 this adds support for the extra_properties table property to Delta tables, such that additional table properties can be set on Delta tables.

The additional properties will not be exposed in SHOW CREATE.

It's a bit unclear how we should handle table properties which can be set by other features, and through extra_properties. Currently we take the simplest approach and allow setting properties both through Trino specific table properties, and through extra_properties. For instance, it's possible to set delta.enableChangeDataFeed both implicitly by setting change_data_feed_enabled, and explicitly by setting extra_properties = MAP(ARRAY['delta.enableChangeDataFeed'], ARRAY['true']).

Additional context and related issues

Fixes #17428

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Introduce `extra_properties` for adding arbitrary properties to Delta tables. ({issue}`17428`)

@cla-bot cla-bot bot added the cla-signed label May 22, 2023
@github-actions github-actions bot added delta-lake Delta Lake connector docs tests:hive labels May 22, 2023
@jkylling jkylling force-pushed the delta-lake-extra-properties branch 2 times, most recently from 1b60439 to 2033a31 Compare May 22, 2023 19:51
@findinpath findinpath self-requested a review June 26, 2023 12:23
@jkylling jkylling force-pushed the delta-lake-extra-properties branch from 2033a31 to a47e9a8 Compare July 19, 2023 13:45
@jkylling jkylling requested a review from ebyhr July 19, 2023 14:05
@jkylling jkylling force-pushed the delta-lake-extra-properties branch from a47e9a8 to 29f371c Compare August 12, 2023 11:02
.stream()
.map(property -> property.getName())
.collect(Collectors.toSet()),
DeltaLakeTableProperties.PROPERTY_NAMES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'll add/remove eventually a new table property, this test will still be 🟢

What do you intend to test here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a table property, but forget to update the DeltaLakeTableProperties.PROPERTY_NAMES set, then this test will fail.

It's because the list of table properties are created when DeltaLakeTableProperties is constructed, while validateExtraProperties only uses DeltaLakeTableProperties.PROPERTY_NAMES and can be static. We could inject DeltaLakeTableProperties to DeltaLakeMetadata instead and get rid of the PROPERTY_NAMES?

throw new TrinoException(INVALID_TABLE_PROPERTY, format("Extra table property key cannot be null '%s'", extraProperties));
}
Set<String> illegalExtraProperties = Sets.intersection(
DeltaLakeTableProperties.PROPERTY_NAMES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure whether extra_properties is a Trino reserved property name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Hive connector we disallow using the key extra_properties in the extra_properties

Disallowing setting table properties through extra_properties and through ordinary properties reduces some possible user confusion, but also limits the use of extra_properties. For Hive this could make sense, as Trino table property names might overlap with Hive table property names with special meanings. For Delta it might not matter, as none of our current table properties have a special meaning as a Delta table property (special Delta table properties all seem to be prefixed with delta.. Perhaps we should get rid of this validation of illegal extra_property keys, and only check for the special delta. keys?

throw new TrinoException(INVALID_TABLE_PROPERTY, format("Extra table property key cannot be null '%s'", extraProperties));
}
Set<String> illegalExtraProperties = Sets.intersection(
DeltaLakeTableProperties.PROPERTY_NAMES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inline PROPERTY_NAMES map declaration (e.g. : reservedTrinoPropertyNames) here in case you won't use it anywhere else.

@@ -645,6 +645,10 @@ The following table properties are available for use:
* ``NONE``

Defaults to ``NONE``.
* - ``extra_properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description of the ticket:

It's a bit unclear how we should handle table properties which can be set by other features, and through extra_properties.

To avoid such situations we should disallow table property names handled by Trino.

See

public static final String DELTA_CHECKPOINT_WRITE_STATS_AS_JSON_PROPERTY = "delta.checkpoint.writeStatsAsJson";
public static final String DELTA_CHECKPOINT_WRITE_STATS_AS_STRUCT_PROPERTY = "delta.checkpoint.writeStatsAsStruct";
public static final String DELTA_CHANGE_DATA_FEED_ENABLED_PROPERTY = "delta.enableChangeDataFeed";
private static final String DELTA_CHECKPOINT_INTERVAL_PROPERTY = "delta.checkpointInterval";
- maybe extract them so that we can ensure the validity of what is being done in DeltaLakeTableProperties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collected them in MetadataEntry.RESERVED_TRINO_PROPERTY_NAMES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also a few keys at

public static final String APPEND_ONLY_CONFIGURATION_KEY = "delta.appendOnly";
public static final String COLUMN_MAPPING_MODE_CONFIGURATION_KEY = "delta.columnMapping.mode";
public static final String COLUMN_MAPPING_PHYSICAL_NAME_CONFIGURATION_KEY = "delta.columnMapping.physicalName";
public static final String MAX_COLUMN_ID_CONFIGURATION_KEY = "delta.columnMapping.maxColumnId";

but it does not look like we support modifying them through Trino table properties.

It seems a bit difficult to enforce by other means than convention that reserved table properties are to be defined in MetadataEntry and added to RESERVED_TRINO_PROPERTY_NAMES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide how to handle:

  • Trino table properties of Delta tables
    public static final String LOCATION_PROPERTY = "location";
    public static final String PARTITIONED_BY_PROPERTY = "partitioned_by";
    public static final String CHECKPOINT_INTERVAL_PROPERTY = "checkpoint_interval";
    public static final String CHANGE_DATA_FEED_ENABLED_PROPERTY = "change_data_feed_enabled";
    public static final String COLUMN_MAPPING_MODE_PROPERTY = "column_mapping_mode";
    public static final String EXTRA_PROPERTIES = "extra_properties";
  • Special Delta table properties which are modified when setting Trino table properties.
  • Special Delta table properties which influences the way Trino reads or writes a table, but cannot be modified through Trino (e.g., delta.appendOnly).

@jkylling jkylling force-pushed the delta-lake-extra-properties branch from 29f371c to 7b2df4a Compare September 5, 2023 17:14
@jkylling jkylling requested a review from findinpath September 5, 2023 17:22
@ebyhr ebyhr removed their request for review November 27, 2023 23:50
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 15, 2024
@mosabua
Copy link
Member

mosabua commented Jan 15, 2024

@findinpath and @kasiafi could you work with @jkylling on next steps here .. assuming this is still in progress and a desired improvement.

@findinpath
Copy link
Contributor

@jkylling was saying in a Slack discussion that he will address the PR comments as soon as he finds free time.

Copy link

github-actions bot commented Feb 7, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 7, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Feb 29, 2024
@mosabua
Copy link
Member

mosabua commented Feb 29, 2024

I am reopening this since it is still in progress.

@mosabua mosabua reopened this Feb 29, 2024
@github-actions github-actions bot removed the stale label Mar 4, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 26, 2024
@mosabua
Copy link
Member

mosabua commented Mar 27, 2024

Here is another one @jkylling that might have gone under your radar. Can you work with @findinpath to get this updated and ready.

@github-actions github-actions bot removed the stale label Mar 28, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 10, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jun 3, 2024
@ebyhr ebyhr removed the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Oct 30, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Nov 20, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Add extra_properties to Delta Lake table properties
5 participants