Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ClickHouse connector #4909

Closed
wants to merge 1 commit into from
Closed

Conversation

wgzhao
Copy link
Member

@wgzhao wgzhao commented Aug 21, 2020

I deleted former PR( #4672) in my stupid operation.

The ClickHouse connector allows querying data stored in
Yandex ClickHouse

The ClickHouse connector is compatible with all ClickHouse versions starting from 20.3.5.21.

to view more details in presto-docs/clichouse.rst

Fixes #4500

@cla-bot cla-bot bot added the cla-signed label Aug 21, 2020
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
@findepi
Copy link
Member

findepi commented Aug 24, 2020

@wgzhao could you possibly squash your commits and make sure the build is green?

@Praveen2112
Copy link
Member

Can we have this on top of #5012 so that we can push table properties without extending JDBC metadata

@rosewms rosewms added the docs label Sep 2, 2020
@electrum
Copy link
Member

electrum commented Sep 3, 2020

@wgzhao It looks like this is close. Can you rebase as @Praveen2112 suggested and squash the commits?

@wgzhao
Copy link
Member Author

wgzhao commented Sep 4, 2020

@wgzhao It looks like this is close. Can you rebase as @Praveen2112 suggested and squash the commits?

#5012 has not merge, can I use it directly?

@Praveen2112
Copy link
Member

@wgzhao #5012 is merged.. So now we could rebase your PR

@findepi
Copy link
Member

findepi commented Sep 13, 2020

@wgzhao can you please rebase and squash all commits into one?

presto-clickhouse/pom.xml Outdated Show resolved Hide resolved
@findepi
Copy link
Member

findepi commented Sep 14, 2020

can you please rebase and squash all commits into one?

@wgzhao Please see https://docs.github.com/en/github/using-git/about-git-rebase
When you do this, the "Commits" tab of the PR (https://github.com/prestosql/presto/pull/4909/commits) should show just 1 commit.

BTW thanks for making sure the build is green!
@Praveen2112 can you take another look?

@findepi findepi requested a review from Praveen2112 September 14, 2020 07:34
@wgzhao wgzhao force-pushed the clickhouse_connector branch from dc91581 to 4c10a90 Compare September 14, 2020 09:45
@findepi
Copy link
Member

findepi commented Sep 14, 2020

@wgzhao note there is a compilation error in ClickHouseQueryBuilder

2020-09-14T10:18:30.9881571Z [ERROR] /home/runner/work/presto/presto/presto-clickhouse/src/main/java/io/prestosql/plugin/clickhouse/ClickHouseQueryBuilder.java:[99,15] incompatible types: java.lang.String cannot be converted to io.prestosql.plugin.jdbc.JdbcClient

@Praveen2112
Copy link
Member

@wgzhao I guess there is a small issue in merging. I'm not sure if InMemoryOrcDataSource and other changes are a part of this PR

@wgzhao wgzhao force-pushed the clickhouse_connector branch from 4c10a90 to 50ee7ea Compare September 14, 2020 13:10
@wgzhao
Copy link
Member Author

wgzhao commented Sep 14, 2020

@wgzhao I guess there is a small issue in merging. I'm not sure if InMemoryOrcDataSource and other changes are a part of this PR

Yes, I remove all mistaken merging and re-commit

@wgzhao
Copy link
Member Author

wgzhao commented Sep 14, 2020

can you please rebase and squash all commits into one?

@wgzhao Please see https://docs.github.com/en/github/using-git/about-git-rebase
When you do this, the "Commits" tab of the PR (https://github.com/prestosql/presto/pull/4909/commits) should show just 1 commit.

BTW thanks for making sure the build is green!
@Praveen2112 can you take another look?

Thank you , I have squash all commits into one commit

@wgzhao wgzhao requested review from rosewms and findepi September 14, 2020 13:14
@findepi findepi removed their request for review September 14, 2020 13:30
@rosewms
Copy link
Contributor

rosewms commented Sep 14, 2020

Looks good to me, from a docs perspective.

@wgzhao wgzhao force-pushed the clickhouse_connector branch 3 times, most recently from 8213d89 to dae1fb7 Compare September 27, 2020 14:14
@wgzhao wgzhao force-pushed the clickhouse_connector branch from 526c37d to 67a9c5d Compare January 27, 2021 12:18
@findepi
Copy link
Member

findepi commented Jan 28, 2021

@wgzhao please let me know when you address the comments. Also just FYI, the build is red currently.

@wgzhao
Copy link
Member Author

wgzhao commented Jan 28, 2021

@wgzhao please let me know when you address the comments. Also just FYI, the build is red currently.

I try my best to address above comments on next week, thanks

@wgzhao wgzhao force-pushed the clickhouse_connector branch 5 times, most recently from 9e44c45 to 905c17c Compare January 30, 2021 15:37
@findepi
Copy link
Member

findepi commented Feb 19, 2021

@wgzhao are you working on any updates or is now waiting for my review?

@wgzhao
Copy link
Member Author

wgzhao commented Feb 19, 2021

@wgzhao are you working on any updates or is now waiting for my review?

If it's convenient for you now, you can review first, we are still in the Chinese New Year holiday

@wgzhao wgzhao force-pushed the clickhouse_connector branch 2 times, most recently from cdd1d7c to 853f1b1 Compare February 27, 2021 05:46
@findepi
Copy link
Member

findepi commented Mar 1, 2021

@wgzhao i am taking a closer look at type mappings and was trying to tie up some loose ends here

However i realized i do not fully understand the idea behind String and FixedString in Clickhouse
Can you help me understand this more?

From the documentation that i found for String https://clickhouse.tech/docs/en/sql-reference/data-types/string/

Strings of an arbitrary length. The length is not limited. The value can contain an arbitrary set of bytes, including null bytes.
The String type replaces the types VARCHAR, BLOB, CLOB, and others from other DBMSs.

And from FixedString docs https://clickhouse.tech/docs/en/sql-reference/data-types/fixedstring/ :

A fixed-length string of N bytes (neither characters nor code points).

To me it sounds like we should map the both types to varbinary, and not varchar or char on the Trino side, otherwise some values (those which happen not to be UTF-8 encoded) may not be accessible.
However, I can imagine docs saying one thing, and everyone using these types to store UTF-8 data only. Is it the case?

@wgzhao
Copy link
Member Author

wgzhao commented Mar 2, 2021

FixedString DOES have a confusing behavior, from the official documentation, the usage scenario is mainly those data stored in fixed length with ASCII , such as IP addresses, hashes for MD5, etc.
The underlying is not stored with UTF8 encoding. But in fact, it can store UTF-8 data, and and be queried, such as the following example:

:) create table t (col FixedString(5)) ENGINE=TinyLog;
:) insert into t values('hell'),('');

:) select * from t;

SELECT *
FROM t

┌─col──┐
│ hell │
│ 中   │
└──────┘

:) select * from t where col = '\0\0';

SELECT *
FROM t
WHERE col = '\0\0'

┌─col─┐
│ 中  │
└─────┘

My suggestion is for FixedString only for one-sided mapping, that is, when querying ClickHouse table, mapping FixedString(N) type to CHAR(N) on Trino,
but when querying, for string whose length is less than N, it needs to be padded with \0 (see the above example), which may confuse user.
When creating the ClickHouse table on the Trino side, map the CHAR type to the String type on clickhouse.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

You also need to add an entry to https://github.com/trinodb/trino/blob/master/core/trino-server/src/main/provisio/presto.xml to make sure the connector gets packaged into the final JAR.

docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/clickhouse.rst Show resolved Hide resolved
docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/clickhouse.rst Show resolved Hide resolved
docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
plugin/trino-clickhouse/pom.xml Outdated Show resolved Hide resolved
import static java.util.Locale.ENGLISH;

/**
* The problem of incorrect catalog names kept tormenting me, I kept modifying it, overriding methods,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can solve this by providing your own implementation of getTableHandle.
The JdbcTableHandle created in getTableHandle takes the catalog name as input.

Take a look at MySqlClient for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also allow you to remove most of the overrides here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it, But it brings up more errors

}

@Override
protected void renameTable(ConnectorSession session, String catalogName, String schemaName, String tableName, SchemaTableName newTable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Override is not needed. If your JdbcTableHandle sets catalogname to empty optional then BaseJdbcClient#renameTable automatically constructs the proper SQL (including quoting).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseJdbcClient#renameTable use ALTER TABLE RENAME TO statement, But ClickHouse uses a different statment RENAME TABLE TO. So, I had to override it.

@findepi
Copy link
Member

findepi commented Mar 2, 2021

@wgzhao I applied most of @hashhar's comments and tweaked the type mapping tests a bit.
I wanted to understand why some of the tests were not using SqlDataTypeTest.

Also, i adapted the connector tests to the new base test class i recently introduced -- i didn't want to add more work on you because of changes done in the meantime.

I also took liberty to swap the String, FixedString mapping into varbinary and added configuration toggle clickhouse.map-string-as-varchar to restore varchar mapping behavior.
This is an interm solution and we should reconsider the mapping, see #7102

I posted #7081 and will merge when green.

As this is the first version of the connector, there naturally is a bunch of things to take care of as follow ups.
I tried to capture them in #7103. If I missed something, let me know.

Thank you for your contribution, you did great job here and I am excited to see ClickHouse connector shipping soon!

@wgzhao wgzhao force-pushed the clickhouse_connector branch from 853f1b1 to 792abce Compare March 2, 2021 13:08
@findepi
Copy link
Member

findepi commented Mar 2, 2021

Thanks @wgzhao for updates here. I am copying them into #7081 as well.

@findepi findepi mentioned this pull request Mar 2, 2021
10 tasks
@findepi
Copy link
Member

findepi commented Mar 2, 2021

Merged as 174d429.

@wgzhao many thanks for the contribution and it so cool to see it being released soon!

@findepi findepi closed this Mar 2, 2021
@findepi findepi added this to the 353 milestone Mar 2, 2021
@wgzhao
Copy link
Member Author

wgzhao commented Mar 2, 2021

Thanks

@wgzhao wgzhao deleted the clickhouse_connector branch March 2, 2021 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Create connector for ClickHouse
9 participants