Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rename table support for Delta connector #11400

Closed
mdesmet opened this issue Mar 9, 2022 · 8 comments · Fixed by #11401
Closed

Implement rename table support for Delta connector #11400

mdesmet opened this issue Mar 9, 2022 · 8 comments · Fixed by #11401

Comments

@mdesmet
Copy link
Contributor

mdesmet commented Mar 9, 2022

Currently the delta connector does not support table renames.

ALTER TABLE mytable RENAME TO mytable2

Supporting the table rename would allow us to use the delta connector in a standard dbt setup. Which would be very beneficial as Delta is currently the only table format that supports row-level DELETE and INSERT that works with the standalone hive metastore. Hive transactional tables only work with specific hive versions (See this issue). Iceberg also doesn't support row-level DELETE and INSERT. Enabling the table rename in Delta is by consequence a low-hanging fruit.

Delta supports both external and managed tables. The table renames should ideally support both cases and work as expected as for example within Spark/Hive metastore. Managed tables are more suitable for a dbt setup as DROP table statements remove the data on the underlying storage and make running the dbt project idempotently and frictionless: rerunning the project won't require manually removing the data, otherwise table creation would error on data already exists in the relevant directories.

dbt works as follows:

initial load

create a working table orders_tmp, do it's work.

When done it will rename the production table orders and then rename the orders_tmp table to the production table name (in snowflake this is done atomically using CREATE OR REPLACE TABLE).

Then it will drop the old production table.

incremental loads

OOTB it contains some support for MERGE statements, however this is not yet supported within Trino. We could however apply INSERT, UPDATE and DELETE statements. knowing that it wouldn't offer the same atomicity as the MERGE statement.

@findepi
Copy link
Member

findepi commented May 24, 2022

There are two kinds of Delta Lake tables

  • tables with location set explicitly when creating the table (marked as "external" in metastore)
  • tables with implicit location derived from schema's location (marked as "managed" in metastore)

String location = getLocation(tableMetadata.getProperties());
if (location == null) {
Optional<String> schemaLocation = getSchemaLocation(schema);
if (schemaLocation.isEmpty()) {
throw new TrinoException(NOT_SUPPORTED, "The 'location' property must be specified either for the table or the schema");
}
location = new Path(schemaLocation.get(), tableName).toString();
checkPathContainsNoFiles(session, new Path(location));
external = false;

The behavior of ALTER TABLE RENAME TO for "external" tables is clear: the location is user-provided and should be retained as-is.

For "managed" tables, ones with implicit locations, we have few options

Option AS_IS: we leave the table location as-is when renaming the table.

Then table rename is O(1) (as it should).
There is a caveat that it's not possible to do:

CREATE TABLE tmp;
ALTER TABLE tmp RENAME TO desired_name;
CREATE TABLE tmp; -- fails because <storage>/tmp location already exists

Also, even without user re-using table names like above:

CREATE TABLE tmp_123;
ALTER TABLE tmp RENAME TO desired_name;

the files of desired_name table are under tmp_123 directory, which may be eternally confusing. Someone good-willed may deleted them ("some old tmp data").

Option RENAME_LOCATION: rename the table location in the same manner as table name

This will be O(1) on some storages and O(data size) on e.g. S3.

There is a caveat that the feature is probably unusable on S3, while S3 is probably the most popular storage these days.

Option EXTONLY: limit RENAME TO this to "non-managed" tables, i.e. ones with explicit location

This avoids the problem. If user provided table location explicitly, we shouldn't change it.
No location rename happens, so table rename is O(1) operation.
User is not confused.

There is a caveat that this requires users to think in terms of paths. So SQL-centric person cannot just do

CREATE TABLE tmp_123;
ALTER TABLE tmp RENAME TO desired_name;

they need to add some WITH (location = '......') option to CREATE TABLE which isn't nice.

Option RANDOMIZE: randomize table locations and keep them as-is when renaming

When creating implicit table location for a new table, append randomized suffix to the table name.
This is already supported in Iceberg, if enabled with iceberg.unique-table-location (#6063). (#12980)

This way, the location is either randomized, or provided explicitly by the user.
In either case, the RENAME TO can keep the current location as-is.

There is a caveat that RENAME TO's behavior depends on the configuration at the time of CREATE TABLE.
We would probably allow users to rename tables with keeping location as-is, but this would be sane behavior for same tables, but not for others. And requires opt-in (new config toggle).

Option SHRUG: delegate the problem to the metastore and 🤷‍♂️

Delegate the problem to the metastore. HMS will do renames on HDFS, and I-don't-currently-know what it will do with paths on S3. Glue will not do renames.

There is a caveat that the behavior will be metastore-dependent, and may further depend on metastore configuration. This will be confusing and frustrating.

cc @claudiusli @alexjo2144 @findinpath @hashhar @losipiuk

@findepi
Copy link
Member

findepi commented May 24, 2022

My favorite option is RANDOMIZE, even though it requires opt-in to work properly

cc also @electrum @phd3

@mdesmet
Copy link
Contributor Author

mdesmet commented May 24, 2022

@findepi: I agree that not moving the data is the best. Note that in current code tables with implicit location are created as MANAGED tables in Hive metastore. The PR had been designed around this limitation.

We could set the table type to EXTERNAL in the Hive metastore when creating new tables just as in the Iceberg connector and just reject renames of MANAGED tables. That way the RANDOMIZE can easily implemented.

@mdesmet
Copy link
Contributor Author

mdesmet commented May 24, 2022

Also changing the table type to EXTERNAL will also NOT drop the data in DROP TABLE statements, as we have to do that ourselves, similar to what's done in the Iceberg Connector.

@findepi
Copy link
Member

findepi commented May 24, 2022

@mdesmet the DROP behavior is exactly why we differentiate tables with explicit and implicit locations.
cc @alexjo2144

@mdesmet
Copy link
Contributor Author

mdesmet commented May 24, 2022

I know that, but I'm really talking about the Hive Metastore's table type. That can be either MANAGED_TABLE or EXTERNAL_TABLE.

In Delta lake we currently create tables without an explicit location as MANAGED_TABLE and simply relay the delete towards the metastore, while in Iceberg all tables are created as EXTERNAL_TABLE in order to not let Hive Metastore meddle with it. We have to delete the data ourselves for tables without an explicit location because of that as Ryan Blue suggested in #5616

@findepi
Copy link
Member

findepi commented Jun 10, 2022

Relates to #5632

@phd3
Copy link
Member

phd3 commented Jun 10, 2022

sane behavior for same tables, but not for others.

@findepi thanks for drafting the alternatives. I like the randomize idea too. where would it not be sane?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants