Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic jdbc datasource connector #2910

Closed
tooptoop4 opened this issue Feb 22, 2020 · 11 comments
Closed

Add generic jdbc datasource connector #2910

tooptoop4 opened this issue Feb 22, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@tooptoop4
Copy link
Contributor

tooptoop4 commented Feb 22, 2020

I'm not talking about client->jdbc driver->presto server. I'm talking about presto server connector to any jdbc source (ie sybase, oracle.etc).

List of columns can be derived by running where 1=0 query like spark does:
apache/spark@64c29af

https://docs.starburstdata.com/latest/connector/generic-jdbc.html

https://www.simba.com/drivers/ , https://www.dbvis.com/features/database-drivers/ , https://en.wikipedia.org/wiki/List_of_relational_database_management_systems show a list of many different data sources this would open up connectivity to.

some examples:

Unidata: com.ibm.u2.jdbc.UniJDBCDriver
kdb: https://github.com/bjconlan/kdb-jdbc
MS Access: sun.jdbc.odbc.JdbcOdbcDriver
SQLLite: org.sqlite.JDBC
SAS: com.sas.net.sharenet.ShareNetDriver
SAP: com.sap.db.jdbc.Driver
Greenplum: com.pivotal.jdbc.GreenplumDriver
DynamoDB: com.simba.dynamodb.jdbc.Driver
Netezza: org.netezza.Driver
Vertica: com.vertica.jdbc.Driver
Clickhouse: ru.yandex.clickhouse.ClickHouseDriver
Sybase: com.sybase.jdbc4.jdbc.SybDriver
Impala: com.cloudera.impala.jdbc41.Driver
SparkSQL ThriftServer: com.simba.spark.jdbc41.Driver
H2: org.h2.Driver
Derby: org.apache.derby.jdbc.ClientDriver
HiveServer2: org.apache.hive.jdbc.HiveDriver
Drill: org.apache.drill.jdbc.Driver
Teradata: com.ncr.teradata.TeraDriver
Snowflake: net.snowflake.client.jdbc.SnowflakeDriver
DB2: com.ibm.db2.jcc.DB2Driver
Salesforce: com.ddtek.jdbc.sforce.SForceDriver
Azure CosmosDB: cdata.jdbc.cosmosdb.CosmosDBDriver
Athena: com.simba.athena.jdbc.Driver
Aurora: org.mariadb.jdbc.Driver
Informix: com.informix.jdbc.IfxDriver
Firebird: org.firebirdsql.jdbc.FBDriver
...and many more

Cons:

  1. Some tables may not be listed in 'show tables' due to different getSchemas() implementations but direct query on the table still working ie https://issues.apache.org/jira/browse/SPARK-18524
  2. Performance will be worse than using native connector, potentially some detractors might claim 'aww presto is slow with x source' but slow is better than nothing right?
  3. Not all data types can be natively mapped, but at least can be cast to VARCHAR. Starburst generic jdbc connector might have a bug as oracle columns of NUMBER were not visible at all in presto.
  4. Some connections may not work at all, but then again they don't now, so nothing to lose

Pros:

  1. Can POC connectivity to legacy sources - ANY source with JDBC interface (point 4 from cons aside)
  2. Can retrieve small lookup datasets from legacy sources for initial stages of data migration
  3. Can use presto as a reconciliation engine to compare data across DBMS
  4. Native connectors for paid (ie non open source) JDBC drivers will never make it into Presto open source release, but users that have paid for those drivers will be able to access those data sources within presto and do joins to other DBMS using presto
  5. Aligns with the title of the white paper 'SQL on Everything' - https://prestosql.io/paper.html.
  6. This feature would attract a lot of users to presto community, further snowballing adoption, they would then contribute other improvements.etc
@ebyhr ebyhr added the enhancement New feature or request label Feb 22, 2020
@dprophet
Copy link
Contributor

dprophet commented Mar 4, 2020

OH, very very interesting. I might be willing to take this one on. It would help so many people, myself included.

@dprophet
Copy link
Contributor

dprophet commented Mar 4, 2020

Anyone have doc's for a skeleton connector so I can fill in the blanks? I assume there is a test harness for local development (thus not having to develop within the Presto server itself). Unit testing framework? Maybe containerized tests for each connector?

@dprophet
Copy link
Contributor

Would you recommend adding test for

https://github.com/bloomberg/comdb2

To this JDBC connector?

@tooptoop4
Copy link
Contributor Author

@dprophet did it work? pls write one :-)

@flypenguin
Copy link

what happened here? I am looking for an informix connector, and it seems that would be the solution ... any updates?

@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Jan 19, 2021

@flypenguin see the linked PR (also use the 4 updated .java files in the root of https://github.com/tooptoop4/presto-1/tree/lightgenjdbc for 348-SNAPSHOT with fixes). you can build from source and use it.

it has been successfully tested against:
sqlite, sybase, oracle, impala, presto, dremio, sparksqlthrift, hiveserver2, postgres, mysql, db2, sqlserver, cockroachdb, derby, h2, hsqldb (ie hypersql), firebird

some recommended settings in .properties:
case-insensitive-name-matching=true
unsupported-type-handling=CONVERT_TO_VARCHAR
jdbc-types-mapped-to-varchar=timestamp

@ebyhr
Copy link
Member

ebyhr commented Jan 20, 2021

@flypenguin Implementing JDBC based connector is easy comparing to other connectors. Please feel free to #dev channel in the community Slack. https://trino.io/slack.html

@findepi
Copy link
Member

findepi commented Jan 20, 2021

@flypenguin adding to @ebyhr 's comment -- you can see ongoing PR for inspiration, like #6134 or #4909

@reisepass
Copy link

Looks like starburst build this. It would be nice if they could commit something to upstream trino https://docs.starburst.io/latest/connector/starburst-generic-jdbc.html

@gesundes
Copy link

JDBC connector can provide the ability to connect to Hive via Knox. So, any progress here, guys?

@hashhar
Copy link
Member

hashhar commented Sep 3, 2024

There are no plans to implement such a connector for now.

Instead there is a example JDBC connector which can be used to bootstrap a proper connector - #15004

Also in practice for good performance pushdowns and type mappings need to be implemented on a per-connector basis anyway.

@hashhar hashhar closed this as not planned Won't fix, can't repro, duplicate, stale Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

8 participants