Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support table redirections from Hive to Delta Lake #11550

Merged
merged 2 commits into from
May 5, 2022

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Mar 17, 2022

The Hive connector can make use of the hive.delta-lake-catalog-name
configuration property for enable table redirects towards the
Delta Lake tables.

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Hive connector

How would you describe this change to a non-technical end user or system administrator?

This functionality allows redirecting behind the scenes the a table task used for reading / updating the table from the Hive connector to the Delta Lake connector in case that the table to which the SQL command points to is a Delta Lake table.

Related issues, pull requests, and links

Depends on #11565

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findepi
Copy link
Member

findepi commented Mar 21, 2022

the CI is red.

btw is this the reason this is a Draft?

@findinpath
Copy link
Contributor Author

btw is this the reason this is a Draft?

I intend to add the product tests for ensuring that the functionality works as expected before setting the PR ready for review.

@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch from 5a062f9 to c245456 Compare March 21, 2022 13:30
@findinpath findinpath marked this pull request as ready for review March 21, 2022 13:30
@findinpath
Copy link
Contributor Author

the CI is red.

There were missing some config coverage tests for Hive before.

@findepi
Copy link
Member

findepi commented Mar 21, 2022

Per offline discussion, this is blocked by #11565 so that we can add tests.

@findepi findepi marked this pull request as draft March 21, 2022 21:04
@findinpath findinpath marked this pull request as ready for review March 22, 2022 05:17
@findinpath
Copy link
Contributor Author

findinpath commented Mar 23, 2022

TODO: Add TestHiveAndDatabricksRedirect and add to trino/testing/trino-product-tests-launcher/src/main/resources/docker/presto-product-tests/conf/environment/singlenode-delta-lake-databricks/hive.properties the following line:

hive.non-managed-table-writes-enabled=true

@findepi
Copy link
Member

findepi commented Mar 25, 2022

Tests require #11565

@findepi findepi marked this pull request as draft March 25, 2022 12:14
@findinpath findinpath marked this pull request as ready for review March 25, 2022 13:00
@findepi findepi marked this pull request as draft April 1, 2022 15:23
@Dearkano
Copy link

Dearkano commented Apr 6, 2022

Hi, I'm wondering if there's any timeline for this feature's release? Thank you!

@findepi findepi marked this pull request as ready for review April 6, 2022 20:39
@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch from e7f1dce to e5a20dd Compare April 11, 2022 14:25
@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch from e5a20dd to 6c5ae97 Compare April 19, 2022 15:37
@findinpath
Copy link
Contributor Author

Rebased on master and added integration tests for the redirection functionality.

@findinpath findinpath requested a review from findepi April 19, 2022 15:38
@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch 2 times, most recently from bd5f4be to 6776fd2 Compare April 19, 2022 15:43
@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch from 6776fd2 to af723a6 Compare April 20, 2022 05:11
@findinpath
Copy link
Contributor Author

@findepi the compilation of the trino-delta-lake project is now failing.
I have no better option at hand than the exclusion of the aws-sdk-core depdency from the analyze-only goal https://maven.apache.org/plugins/maven-dependency-plugin/examples/exclude-dependencies-from-dependency-analysis.html

@findepi

This comment was marked as duplicate.

@findepi findepi force-pushed the add-delta-to-hive-redirects branch from dcaca0a to 22f227b Compare April 21, 2022 10:41
@findepi
Copy link
Member

findepi commented Apr 22, 2022

https://github.com/trinodb/trino/runs/6110999392?check_suite_focus=true

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project trino-delta-lake: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/trino/trino/plugin/trino-delta-lake/target/surefire-reports for the individual test results.
Error:  Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Error:  The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Error:  Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-delta-lake && /opt/hostedtoolcache/Java_Zulu_jdk/11.0.15-10/x64/bin/java -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-delta-lake/target/surefire/surefirebooter5870515897782731080.jar /home/runner/work/trino/trino/plugin/trino-delta-lake/target/surefire 2022-04-21T13-15-20_232-jvmRun1 surefire15439557697247218053tmp surefire_011486807429141444913tmp
Error:  Error occurred in starting fork, check output in log
Error:  Process Exit Code: 137
Error:  Crashed tests:
Error:  io.trino.plugin.deltalake.TestDeltaLakeOptimizedWriterConnectorSmokeTest

cc @findinpath @hashhar

@@ -426,6 +432,18 @@
</ignoredResourcePatterns>
</configuration>
</plugin>

<plugin>
<artifactId>maven-dependency-plugin</artifactId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch 2 times, most recently from d5c9463 to 7e20a59 Compare April 26, 2022 04:14
@findinpath
Copy link
Contributor Author

Rebased on master in an attempt to avoid build runner related issue related to the Github runner while trying to build trino-delta-lake project.

@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch 10 times, most recently from 6e980e8 to 21ea76e Compare April 28, 2022 18:25
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things

plugin/trino-delta-lake/pom.xml Outdated Show resolved Hide resolved
@@ -116,6 +116,7 @@
public static final String SIZE_BASED_SPLIT_WEIGHTS_ENABLED = "size_based_split_weights_enabled";
public static final String MINIMUM_ASSIGNED_SPLIT_WEIGHT = "minimum_assigned_split_weight";
public static final String NON_TRANSACTIONAL_OPTIMIZE_ENABLED = "non_transactional_optimize_enabled";
public static final String DELTA_LAKE_CATALOG_NAME = "delta_lake_catalog_name";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, put the redirect catalog name fields together

@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch from 21ea76e to b37089a Compare April 29, 2022 04:36
The Hive connector can make use of the `hive.delta-lake-catalog-name`
configuration property for enable table redirects towards the
Delta Lake tables.
@findinpath findinpath force-pushed the add-delta-to-hive-redirects branch from b37089a to cafe666 Compare April 29, 2022 14:03
@findinpath
Copy link
Contributor Author

Rebased on master to make use of 49381ba

@findepi findepi merged commit 74b324c into trinodb:master May 5, 2022
@findepi findepi mentioned this pull request May 5, 2022
@github-actions github-actions bot added this to the 380 milestone May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants