Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Nessie Catalog in Iceberg connector #11701

Merged
merged 1 commit into from
May 29, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Mar 29, 2022

This PR integrates the (Nessie catalog functionality)[https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie] to the Iceberg connector. It adds the following new things:

  • a new CatalogType called NESSIE
  • an IcebergNessieCatalogModule that sets up all necessary dependencies (including a client to connect to the Nessie server)
  • a NessieConfig that includes configuration settings required for Nessie
  • TrinoNessieCatalog + NessieIcebergTableOperations that implement the main behavior of the catalog
  • some unit and integration tests that verify that the Iceberg connector works with Nessie (note that the integration test requires a Nessie server to be started, which is being done via the nessie-apprunner-maven-pluginprior to the integration-test phase)

@cla-bot cla-bot bot added the cla-signed label Mar 29, 2022
@nastra nastra requested a review from findepi March 29, 2022 08:49
@nastra nastra force-pushed the add-nessie-support branch from 66fed2e to 191715e Compare March 29, 2022 11:39
@nastra
Copy link
Contributor Author

nastra commented Mar 29, 2022

@findepi looks like one of the CI jobs timed out. would it be possible to restart just this one job?

@nastra nastra closed this Mar 30, 2022
@nastra nastra reopened this Mar 30, 2022
@ebyhr
Copy link
Member

ebyhr commented Mar 30, 2022

@nastra GitHub Actions recently introduced "Re-run failed jobs". The feature might be not yet stable though. By the way, we will request to push an empty commit for triggering CI if necessary.

@nastra
Copy link
Contributor Author

nastra commented Mar 31, 2022

@ebyhr I don't have permissions unfortunately to re-run the failed job myself, that's why I closeed & reopened the PR, which triggers a full CI run again (same as pushing)

@findinpath
Copy link
Contributor

Warning:  Invalid bytecodeVersion for org.glassfish.jersey.core:jersey-common:jar:2.30:compile : META-INF/versions/11/org/glassfish/jersey/internal/jsr166/SubmissionPublisher$1.class: expected 55, but was 56

This library seems to be compiled with Java 12, but Trino works with Java 11

@findinpath findinpath self-requested a review March 31, 2022 13:54
@nastra
Copy link
Contributor Author

nastra commented Mar 31, 2022

@findinpath I can see the same warning in other CI runs on other branches. Here's an example from master from https://github.com/trinodb/trino/runs/5763019856?check_suite_focus=true.

Nessie itself is also compiled with Java 11 btw

@nastra nastra force-pushed the add-nessie-support branch from 191715e to 6beef07 Compare March 31, 2022 14:43
@electrum
Copy link
Member

The tests should be normal unit tests, not ITs. You can probably use Testcontainers for Nessie.

@findinpath
Copy link
Contributor

It would be nice to see some tests which involve dealing with running SQL statements over the query runner to actually test how the connector works with the new catalog type.

Ideal would be (at a later point) to have additionally also a product tests environment (similar to singlenode-spark-iceberg
See for reference:

testing/bin/ptl env describe  --environment EnvSinglenodeSparkIceberg  --config config-default

@findinpath
Copy link
Contributor

Thank you @nastra for your contribution. I am looking forward to help out in this PR to add support for Nessie in Trino Iceberg connector.

@nastra nastra force-pushed the add-nessie-support branch from 585fa60 to be8ec7c Compare April 1, 2022 09:43
@nastra
Copy link
Contributor Author

nastra commented Apr 1, 2022

@findinpath thanks for the review. I updated the code based on your review and pushed a new commit. Will look into doing some additional testing via SQL statements now.

@findinpath
Copy link
Contributor

@nastra you can use the Glue related tests from the connector as a rough template on what kinds of tests you need to add.

@nastra nastra force-pushed the add-nessie-support branch 2 times, most recently from 247f4ee to 9466225 Compare April 1, 2022 14:33
{
this.nessieApi = requireNonNull(nessieApi, "nessieApi is null");
this.config = requireNonNull(config, "nessieConfig is null");
this.reference = () -> loadReference(config.getDefaultReferenceName(), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the client need to know the reference configured for the connector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes as it needs to operate on the correct branch/tag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, does the client need to know this detail while it is being created?
Otherwise said, should the catalog deliver the branch/tag name on each call to the client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this is what we might end up having (the catalog providing the branch/tag to the client), but I didn't want to over-complicate the current implementation. That being said, I would prefer to adjust this part of the code once the catalog knows which branch/tag it is being used with then executing a particular SQL

@nastra nastra force-pushed the add-nessie-support branch from 9466225 to 86ce1dc Compare April 6, 2022 10:31
@findinpath
Copy link
Contributor

Could you please add documentation on how to setup Nessie catalog in the Iceberg connector

You can use https://github.com/trinodb/trino/pull/11772/files#diff-e1aabf1cfd8bd8aa7d1b75e70089b57413b2e620a5eebeb36cc76fd3f2ac60db as a template for getting started.

plugin/trino-iceberg/pom.xml Outdated Show resolved Hide resolved
{
this.nessieApi = requireNonNull(nessieApi, "nessieApi is null");
this.config = requireNonNull(config, "nessieConfig is null");
this.reference = () -> loadReference(config.getDefaultReferenceName(), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, does the client need to know this detail while it is being created?
Otherwise said, should the catalog deliver the branch/tag name on each call to the client?

@ajantha-bhat
Copy link
Member

ajantha-bhat commented May 22, 2023

@ebyhr: Thanks for the review.

Fixed all the comments. PR is ready for review.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits into one and remove commit body.

plugin/trino-iceberg/pom.xml Show resolved Hide resolved
plugin/trino-iceberg/pom.xml Outdated Show resolved Hide resolved
@@ -124,7 +125,7 @@ public void setUp()
onTrino().executeQuery(format("CREATE SCHEMA IF NOT EXISTS %s.%s", TRINO_CATALOG, TEST_SCHEMA_NAME));
}

@Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS, ICEBERG_REST, ICEBERG_JDBC}, dataProvider = "storageFormatsWithSpecVersion")
@Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS, ICEBERG_REST, ICEBERG_JDBC, ICEBERG_NESSIE}, dataProvider = "storageFormatsWithSpecVersion")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests are missing ICEBERG_NESSIE group.

Copy link
Contributor Author

@nastra nastra May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to just run a few tests with nessie to keep CI times low and to show that trino+spark+nessie work

@nastra nastra force-pushed the add-nessie-support branch from 9e47932 to a3d8a20 Compare May 25, 2023 13:01
Co-authored-by: Ajantha Bhat <ajanthabhat@gmail.com>
@nastra nastra force-pushed the add-nessie-support branch from a3d8a20 to aee6900 Compare May 25, 2023 23:04
@ajantha-bhat
Copy link
Member

The failed test TestHiveStorageFormats > testSelectWithNullFormat(0: StorageFormat{name=SEQUENCEFILE, properties={}, sessionProperties={}}) [groups: storage_formats_detailed] is independent of PR changes and could be flaky.

Re-triggering the build

@ebyhr
Copy link
Member

ebyhr commented May 26, 2023

@ajantha-bhat Please note that you don't need to retrigger CI when the failed test is unrelated to the change. Also, we recommend pushing an empty commit instead of reopening PR when retriggering CI.

@ajantha-bhat
Copy link
Member

@ajantha-bhat Please note that you don't need to retrigger CI when the failed test is unrelated to the change. Also, we recommend pushing an empty commit instead of reopening PR when retriggering CI.

Thanks. I am new to Trino contributions. Good to know.

@ajantha-bhat
Copy link
Member

PR is ready. Thanks.

@ebyhr ebyhr merged commit 9dc8375 into trinodb:master May 29, 2023
@github-actions github-actions bot added this to the 419 milestone May 29, 2023
@ebyhr ebyhr mentioned this pull request May 29, 2023
@nastra nastra deleted the add-nessie-support branch May 30, 2023 05:30
ajantha-bhat added a commit to ajantha-bhat/trino that referenced this pull request Jun 1, 2023
…ctor

To reduce the review effort, only the basic Nessie configurations were supported in trinodb#11701.
Nessie server can be deployed with Auth mode like keycloak. So, need to expose the Nessie client configurations to handle the Auth.
Along with that, some common Nessie server configurations like read-timeout-ms, connect-timeout-ms and compression-enabled properties
are exposed to have finer control over the Nessie commits.
ajantha-bhat added a commit to ajantha-bhat/trino that referenced this pull request Jun 5, 2023
…ctor

To reduce the review effort, only the basic Nessie configurations were supported in trinodb#11701.
Nessie server can be deployed with Auth mode like keycloak. So, need to expose the Nessie client configurations to handle the Auth.
Along with that, some common Nessie server configurations like read-timeout-ms, connect-timeout-ms and compression-enabled properties
are exposed to have finer control over the Nessie commits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.