Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add register_table procedure support for delta table #14779

Merged
merged 1 commit into from
Nov 24, 2022

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Oct 26, 2022

Description

Fixes #13568

  1. Procedure call -> delta.system.register_table(shcema_name => 'testdb', table_name => 'table1', table_location => 's3://my-bukcet/a/path/')
  2. By default CREATE TABLE with(location='***') will not allow to register table using existing location.
  3. Enable via delta.create-table-with-existing-location.enabled config property or create_table_with_existing_location_enabled session property to allow user to register table using CREATE TABLE statement (This support will be removed permanently after some release)
  4. By default register_table procedure is disabled. Enable it via delta.allow-register-table-procedure config property

Non-technical explanation

NA

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Delta Lake
* Allow registering existing table files in the metastore with the new
  [`register_table` procedure](delta-lake-register-table). ({issue}`13568`)
* Deprecate creating a new table with existing table content.
  This can be enabled using the `delta.legacy-create-table-with-existing-location.enabled`
  config property or `legacy_create_table_with_existing_location_enabled` session property. ({issue}`13568`)

@cla-bot cla-bot bot added the cla-signed label Oct 26, 2022
@krvikash krvikash force-pushed the delta-register_table-support branch 2 times, most recently from 05c7aaa to 5df0865 Compare October 27, 2022 10:20
@krvikash krvikash self-assigned this Oct 27, 2022
@krvikash krvikash force-pushed the delta-register_table-support branch 6 times, most recently from ce19fb2 to 81456e5 Compare October 28, 2022 21:09
@krvikash krvikash marked this pull request as ready for review October 28, 2022 21:39
@krvikash krvikash force-pushed the delta-register_table-support branch 8 times, most recently from ae2fba1 to 68cce18 Compare October 31, 2022 17:44
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test that shows the legacy syntax can be re-enabled?

String tableNameNew = "test_register_table_with_no_transaction_log_new_" + randomTableSuffix();

// Delete files under transaction log directory to verify register_table call fails
deleteDirectoryContents(Path.of(getTransactionLogDir(new org.apache.hadoop.fs.Path(tableLocation)).toString()), ALLOW_INSECURE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just create the table using the Hive or Iceberg connectors, if you have one of those catalogs configured for the test

CREATE TABLE hive.schema.table_name ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm that I understand correctly, We can create a table using Hive or Iceberg connectors to verify the delta test cases where delta logs are missing?

@krvikash
Copy link
Contributor Author

krvikash commented Nov 1, 2022

Do you have a test that shows the legacy syntax can be re-enabled?

No, Not yet. I will add the test case for this.

@krvikash krvikash force-pushed the delta-register_table-support branch 3 times, most recently from d99b7d9 to bb666ee Compare November 2, 2022 05:57
@krvikash krvikash force-pushed the delta-register_table-support branch from b7aa101 to 9f60dad Compare November 16, 2022 08:43
@krvikash krvikash force-pushed the delta-register_table-support branch 2 times, most recently from c4ad97c to 55bdd02 Compare November 17, 2022 10:22
@krvikash
Copy link
Contributor Author

rebased and added BaseDeltaLakeConnectorSmokeTest#testRegisterTableWithTrailingSlashLocation

@krvikash krvikash force-pushed the delta-register_table-support branch 4 times, most recently from b94091b to aceba06 Compare November 18, 2022 16:14
@ebyhr ebyhr force-pushed the delta-register_table-support branch 2 times, most recently from 3fb76ff to f0f9499 Compare November 21, 2022 06:41
@findinpath
Copy link
Contributor

LGTM % some minor test related improvements.

@ebyhr ebyhr force-pushed the delta-register_table-support branch from f0f9499 to 91c8885 Compare November 23, 2022 22:53
@ebyhr ebyhr force-pushed the delta-register_table-support branch from 91c8885 to 71dedbc Compare November 24, 2022 03:48
@findinpath findinpath self-requested a review November 24, 2022 04:43
@ebyhr
Copy link
Member

ebyhr commented Nov 24, 2022

CI hit #15173

@ebyhr ebyhr merged commit c247fc8 into trinodb:master Nov 24, 2022
@github-actions github-actions bot added this to the 404 milestone Nov 24, 2022
@ebyhr ebyhr mentioned this pull request Nov 24, 2022
@krvikash krvikash deleted the delta-register_table-support branch December 21, 2022 20:10
@hangc0276
Copy link

When I registered a delta table with an existing path and the new data kept writing to the path, I used SQL to query the data from the table and found it couldn't load new data.

I register a new table with the same path, and new data can be queried out.

Is it the expected behavior? @ebyhr @krvikash

@findinpath
Copy link
Contributor

@hangc0276 pls sketch your scenario in a new github issue - your question is not related to this PR.

To be taken into account when writing the new issue

... and the new data kept writing to the path

Was writing in the newly registered table happening through INSERT / UPDATE / MERGE operations via Trino/Spark or rather at the file level ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Distinguish creation of new table and registering existing table operations in Delta Lake connector
6 participants