-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Delta Lake product tests #11565
Add Delta Lake product tests #11565
Conversation
94c2923
to
6c00390
Compare
...-product-tests/src/main/java/io/trino/tests/product/deltalake/TestOssDeltaLakeHdfsReads.java
Outdated
Show resolved
Hide resolved
...-product-tests/src/main/java/io/trino/tests/product/deltalake/TestOssDeltaLakeHdfsReads.java
Outdated
Show resolved
Hide resolved
d00b9a8
to
6e3e356
Compare
6e3e356
to
8781e2a
Compare
0740a29
to
e77f81f
Compare
303427c
to
06bfb9a
Compare
Does Minio need it's own module, or can it go into one of the existing shared ones, like |
|
06bfb9a
to
bd9e5b5
Compare
310fb13
to
5ba4bd2
Compare
@alexjo2144 Adressed the issue by creating the minio bucket directory on the fly while configuring MinIO via testcontainers. |
I am currently intentionally not fixing the Git conflicts in order to keep the basis of the current PR which is used in a private downstream project. |
throw new UncheckedIOException(e); | ||
} | ||
builder.configureContainer(MINIO_CONTAINER_NAME, container -> | ||
container.withCopyFileToContainer(forHostPath(minioBucketDirectory), "/data/" + S3_BUCKET_NAME)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go in the common/Minio
file. It's not clear here what's special about the /data/
directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't /data
where MinIO docker container stores buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/data
is the argument given to minio server to point the directory where the buckets are physically stored.
The Docker container process is started with:
.withCommand("server", "--address", format("0.0.0.0:%d", MINIO_PORT), "/data")
I think that this customisation is specific for EnvSinglenodeDeltaLakeOss
and not a common setting for MinioContainer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least have /data/
stored in a static class variable on Minio
then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's follow up on this
f4b70bd
to
97b8ca5
Compare
Change of plan. Fixing the conflicts. |
5a74dc8
to
3d0e6e0
Compare
Good question.
|
3d0e6e0
to
684e998
Compare
Allow the Delta Lake product tests to make use of the Delta Lake testing resources.
95d0a5a
to
44591e1
Compare
Rebased on |
@alexjo2144 ptal |
79e7376
to
032704c
Compare
The Delta Lake product tests can be all executed with SuiteDeltaLake suite class. The following test product test environments are exposed: - single-node-delta-lake-oss: used to test the compatibility of the Trino Delta Lake connector with Apache Spark with Delta OSS - single-node-delta-lake-databricks: used to test the compatibility of the Trino Delta Lake connector with Delta Lake Databricks - single-node-delta-lake-kerberized-hdfs: used to test Delta Lake connector on top of kerberized Hadoop environment - single-node-minio-data-lake: lightweight environment that can be used to test the Lakehouse connectors with HMS & MinIO The aim of the Delta Lake product tests is to ensure compatibility with both implementations of Delta Lake: - Delta OSS - Databricks Delta These product tests were originally written for the Starburst Enterprise Delta Lake connector. Co-authored by various engineers at Starburst Data: Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Alex Jo <alex.jo@starburstdata.com> Co-authored-by: Łukasz Osipiuk <lukasz@osipiuk.net> Co-authored-by: Konrad Dziedzic <konraddziedzic@gmail.com> Co-authored-by: Adam J. Shook <shook@datacatessen.com> Co-authored-by: Mateusz Gajewski <mateusz.gajewski@gmail.com> Co-authored-by: Gaurav Sehgal <gaurav.sehgal8297@gmail.com> Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com> Co-authored-by: Ashhar Hasan <ashhar.hasan@starburstdata.com> Co-authored-by: Michał Ślizak <michal.slizak+github@gmail.com> Co-authored-by: Grzegorz Kokosiński <grzegorz@starburstdata.com> Co-authored-by: Arkadiusz Czajkowski <arek@starburstdata.com> Co-authored-by: Jacob I. Komissar <jacob.komissar@starburstdata.com> Co-authored-by: Krzysztof Sobolewski <krzysztof.sobolewski@starburstdata.com> Co-authored-by: Krzysztof Skrzypczynski <krzysztof.skrzypczynski@starburstdata.com> Co-authored-by: Yuya Ebihara <yuya.ebihara@starburstdata.com> Co-authored-by: Praveen Krishna <praveenkrishna@tutanota.com> Co-authored-by: Karol Sobczak <napewnotrafi@gmail.com> Co-authored-by: Sasha Sheikin <myminitrue@gmail.com> Co-authored-by: Szymon Homa <szymon.homa@starburstdata.com>
2020ab3
to
4a81969
Compare
Description
Expose Delta Lake product tests
TODOs :
Figure out how to setup Databricks environment to be used the Delta Lake connector tests.
Open issues:
- no auto-restart when querying a terminated cluster - can be solved by creating a new cluster via Databricks Clusters API v2
- no Instance profiles functionality available on community clusters - this is a serious limitation because the Delta Lake connector tests create tables backed by AWS S3 buckets
- per Community account only one cluster can be created - can be solved by creating multiple accounts in order to test Databricks 9.1 LTS, 7.3 LTS
Tests
Delta Lake connector
This change contributes to ensuring accuracy of the functionality exposed by the Delta Lake connector.
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: