Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JDBC catalog in Iceberg connector #11772

Merged
merged 1 commit into from
Jan 19, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Apr 4, 2022

Description

Support JDBC catalog in Iceberg connector.
Fixes #9968

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg
* Add support for JDBC catalog. ({issue}`9968`)

@sungwy-backup
Copy link

Very excited to see this getting traction. Just wanted to make a note here that we may expect to see a similar issue with #6850 if not accounted for.

The iceberg catalog on spark sql writes and preserves upper cases in schema and table names through its jdbc connection, and this could break on Trino iceberg-jdbc catalog if not accounted for. We've been testing out our own versions of jdbc catalog on our iceberg plugin, and we have come across this issue that we have yet to resolve.

Query
show tables from test
Result
exampletable1

In database, upper case is preserved as "exampleTable1"

Query
select * from test."exampleTable1" limit 5
Error
Table 'iceberg_jdbc.test.exampleTable1' does not exist

We do not see the same issue with lower cased table names

@ebyhr
Copy link
Member Author

ebyhr commented Apr 25, 2022

@sungwy The root cause relates to #17. We prefer small start when adding a new enhancement, so I won't handle case insensitive matching in this PR. We can improve such things later.

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from d2eb975 to 5a6cd1f Compare May 2, 2022 08:00
@ebyhr ebyhr marked this pull request as ready for review May 2, 2022 08:37
@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 5a6cd1f to 95ce171 Compare May 2, 2022 09:04
@findepi
Copy link
Member

findepi commented May 4, 2022

Do we have / want to have some compatibility tests with Spark?

@meetsitaram
Copy link

Can you provide some details on how to configure s3 access with jdbc catalog? For hive catalog, we add below config in iceberg.properties file.

hive.s3.endpoint=S3_ENDPOINT
hive.s3.aws-access-key=S3_ACCESS_KEY
hive.s3.aws-secret-key=S3_SECRET_KEY

Wondering if similar config is required with jdbc as well or some alternative exists that can access ~/.aws/config and ~/.aws/credentials files.

@samredai
Copy link

samredai commented May 5, 2022

Can you provide some details on how to configure s3 access with jdbc catalog? For hive catalog, we add below config in iceberg.properties file.

hive.s3.endpoint=S3_ENDPOINT
hive.s3.aws-access-key=S3_ACCESS_KEY
hive.s3.aws-secret-key=S3_SECRET_KEY

Wondering if similar config is required with jdbc as well or some alternative exists that can access ~/.aws/config and ~/.aws/credentials files.

My understanding is that it would be the same config items you've listed (and testing with the IcebergQueryRunner seems to confirm this). The Iceberg catalog implementations just eventually pass these to the TrinoS3ConfigurationInitializer. Admittedly, the hive prefix for the parameter names feels odd when you're using any of the other catalog types (such as JDBC here) but IMO changing that probably requires too much to bundle with this PR.

@meetsitaram
Copy link

Can you provide some details on how to configure s3 access with jdbc catalog? For hive catalog, we add below config in iceberg.properties file.

hive.s3.endpoint=S3_ENDPOINT
hive.s3.aws-access-key=S3_ACCESS_KEY
hive.s3.aws-secret-key=S3_SECRET_KEY

Wondering if similar config is required with jdbc as well or some alternative exists that can access ~/.aws/config and ~/.aws/credentials files.

My understanding is that it would be the same config items you've listed (and testing with the IcebergQueryRunner seems to confirm this). The Iceberg catalog implementations just eventually pass these to the TrinoS3ConfigurationInitializer. Admittedly, the hive prefix for the parameter names feels odd when you're using any of the other catalog types (such as JDBC here) but IMO changing that probably requires too much to bundle with this PR.

the same configs are working with jdbc catalog as well. thx.

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 95ce171 to 6012aeb Compare May 11, 2022 12:41
@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch 2 times, most recently from 7a6f67d to 298e8c3 Compare May 12, 2022 00:42
@ebyhr
Copy link
Member Author

ebyhr commented May 12, 2022

Added support for product tests with Spark.

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 298e8c3 to 1386ad5 Compare May 28, 2022 01:12
@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 1386ad5 to 9c4cda8 Compare June 6, 2022 01:46
@sungwy
Copy link
Member

sungwy commented Jun 13, 2022

Hi @ebyhr : here's the recently merged PR in Apache Iceberg that resolves the Jdbc catalog reserved keyword conflict on NAMESPACE_PROPERTY_KEY for reference

apache/iceberg#5017

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 9c4cda8 to a14c771 Compare June 13, 2022 13:56
@@ -0,0 +1,20 @@
/**
Table definition in Iceberg repository:
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we make sure this file stays in sync with the referenced one?

or, maybe we can avoid having a copy here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the namespace_properties_table changes along with others we were waiting for just got released on 0.14.0 a few days ago.

Maybe we could now reference the static variable JdbcUtil.CREATE_NAMESPACE_PROPERTIES_TABLE here going forward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the namespace_properties_table changes along with others

what are the changes?
does it mean we should test this catalog against Iceberg 13 and Iceberg 14 as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They changed column names in apache/iceberg#5017. The PR doesn't contain logic to handle the old table definition, so I suppose they just broke the backward compatibility.

does it mean we should test this catalog against Iceberg 13 and Iceberg 14 as well?

I think so. (or clarify the supported version in documentation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They changed column names in apache/iceberg#5017. The PR doesn't contain logic to handle the old table definition, so I suppose they just broke the backward compatibility.

Concerning

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this issue on Iceberg repo no backwards compatible was broken because the previous version of this was never released? apache/iceberg#4952

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I can confirm the Iceberg guys didn't release a breaking change, see the file in question at 0.13.2 https://github.com/apache/iceberg/blob/apache-iceberg-0.13.2/core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java where this NAMESPACE_PROPERTY_KEY didn't exist, and now see same file at 0.14.0, where it does exist with the new value https://github.com/apache/iceberg/blob/apache-iceberg-0.14.0/core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java#L83.

So sounds like it was never released with the old value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfcampos Your confirmation looks correct. Do you know if they have compatibility test in the repository?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of, no

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from a14c771 to 20f4e93 Compare July 21, 2022 02:09
@ebyhr
Copy link
Member Author

ebyhr commented Jul 21, 2022

Just rebased on upstream to resolve logical conflicts.

@martint
Copy link
Member

martint commented Jan 13, 2023

cc @mosabua @bitsondatadev for how to best provide those warnings in the docs.

@bitsondatadev
Copy link
Member

bitsondatadev commented Jan 13, 2023

@ebyhr, let's discuss some of the documentation phrasing on Slack.

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 94de1c3 to 8b536f3 Compare January 17, 2023 04:13
@ebyhr
Copy link
Member Author

ebyhr commented Jan 17, 2023

Rebased on upstream to resolve conflicts.

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from 8b536f3 to e203020 Compare January 17, 2023 09:34
@ebyhr
Copy link
Member Author

ebyhr commented Jan 17, 2023

Addressed comments.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give @alexjo2144 or @findinpath a chance to read too

Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we respecting the jdbc.strict-mode table property?

Do we have follow up issues for Views and MVs?

docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
public Connection openConnection()
throws SQLException
{
Connection connection = DriverManager.getConnection(connectionUrl);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some connection pool here to draw from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip supporting connection pools in this PR. We can add it later when needed.

@ebyhr ebyhr force-pushed the ebi/iceberg-jdbc-catalog branch from e203020 to 5dd2a4a Compare January 18, 2023 08:09
@ebyhr
Copy link
Member Author

ebyhr commented Jan 18, 2023

Are we respecting the jdbc.strict-mode table property?

We shouldn't create tables on non-existing namespaces regardless of the properties and it's tested by BaseConnectorTest#testCreateTableSchemaNotFound. The test doesn't ensure the concurrent modification though.

Do we have follow up issues for Views and MVs?

No because there's no plan to support them at this time.

@ebyhr
Copy link
Member Author

ebyhr commented Jan 18, 2023

(Rebased on upstream to resolve conflicts)

Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Member

@bitsondatadev bitsondatadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a docs review here.

Few small changes to the docs but they look good!

docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@bjackson-ep
Copy link

Greetings @findepi and @bitsondatadev thank you for your diligent work on JDBC other connectors. Is there anything you need from the community to get this one across the line? I am happy to connect you with Dremio PMs or others that can provide clarifications or support. I see a lot of value in Trino and would love to see the query-where-it-lives grow to this catalog. We are believers in data as code and want to help evolve that space through technologies like Trino and Nessie. Thank you.

@alexjo2144
Copy link
Member

Is there anything you need from the community to get this one across the line?

The JDBC catalog was included in the 406 release a few weeks ago, so I would call it across the line. https://trino.io/docs/current/release/release-406.html#iceberg-connector

Let us know if you have a chance to try it out, feedback is always welcome.

@bjackson-ep
Copy link

bjackson-ep commented Feb 9, 2023 via email

@penggewudi
Copy link

penggewudi commented Jun 25, 2023

@sungwy The root cause relates to #17. We prefer small start when adding a new enhancement, so I won't handle case insensitive matching in this PR. We can improve such things later.

Excuse me, I would like to ask how is the progress of this issue? We connect to Iceberg through the Postgre JDBC catalog, the following is the properties configuration of the configuration catalog.

connector.name=iceberg
hive.s3.endpoint=xxx
hive.s3.aws-access-key=xxx
hive.s3.aws-secret-key=xxx
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
iceberg.catalog.type=jdbc
iceberg.jdbc-catalog.catalog-name=jdbc
iceberg.jdbc-catalog.driver-class=org.postgresql.Driver
iceberg.jdbc-catalog.connection-url=jdbc:postgresql://xxx/jdbc
iceberg.jdbc-catalog.connection-user=xxx
iceberg.jdbc-catalog.connection-password=xxx
iceberg.jdbc-catalog.default-warehouse-dir=s3://xxx/

We use spark sql to write table named Test_Table into Iceberg, When querying with the following statement through Trino CLI, the error of Table 'xxx.test_table' does not exist will appear. I try to add parameters like iceberg.jdbc-catalog.case-insensitive-name-matching=true, it seems that Trino service It will report an error and cannot start.

SELECT * FROM Test_Table / SELECT * FROM test_table

In addition, when we write to Iceberg in the same way but the table named test_table, I can query and get the results I want. If I want to keep the consistency of writing and reading table names, how can I do it Woolen cloth?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Feature request: Support for jdbc catalog in Iceberg Connector