Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for a new database #48

Closed
chhetripradeep opened this issue Nov 16, 2019 · 10 comments
Closed

Adding support for a new database #48

chhetripradeep opened this issue Nov 16, 2019 · 10 comments

Comments

@chhetripradeep
Copy link

Hi Takashi,

Thank you for this awesome project. I was interested to know how difficult is it to add support for a completely new database. I was looking for clickhouse (https://github.com/ClickHouse/ClickHouse) support. I would like to know what all components need to be updated for adding a new database support.

@chhetripradeep
Copy link
Author

Hi @k0kubun I would need some help with this. The code which i am uncomfortable changing is the one taken from vitess. It will be great to know your thoughts on this.

@yudppp
Copy link
Contributor

yudppp commented Nov 29, 2019

Hi @chhetripradeep I don't know clickhouse at all.
Does Clickhouse have its own syntax that neither mysql nor postgresql doesn't have?
If not, I think you should update following only.

I'm not a committer, so I'm sorry if I made a mistake.

@k0kubun
Copy link
Collaborator

k0kubun commented Nov 29, 2019

Thanks @yudppp for your help! As he suggested, if ClickHouse is compatible with either MySQL or PostgreSQL, please just use mysqldef or psqldef as is.

In case it's not compatible MySQL or PostgreSQL so much, the amount of effort much depends on how ClickHouse is different from these supported databases for each of sqldef implementation layers. While I'm not familiar with ClickHouse and it's hard to exactly estimate it for me, let me roughly explain how it's difficult and what you need to expect for each layer.

CLI

The easiest part.

If clickhouse-client is CLI like mysql and psql, I'd name the CLI as clickhouse-clientdef for consistency with others (just removing def works when there's no schema yet), clickhouse-sqldef to clarify it's a sqldef project, or clickhousedef to make it as short as possible.

Then, the file you need to put would be cmd/clickhouse-clientdef/clickhouse.go, cmd/clickhouse-sqldef/clickhouse.go, or cmd/clickhousedef/clickhousedef.go.
Please refer to cmd/mysqldef/mysqldef.go and cmd/psqldef/psqldef.go for how it can be implemented. I recommend you to make the CLI compatible with clickhouse-client.

Database adapter

Do you know Go database driver for ClickHouse?
If there's already such a thing, we can just use that to connect sqldef to ClickHouse. Otherwise we'd need to prepare it first. Also, you need to have a way to dump the current table schema through the database driver.

See adapter/mysql/mysql.go and adapter/postgres/postgres.go, add adapter/clickhouse/clickhouse.go, and use it from the CLI implementation.

SQL parser

The code which i am uncomfortable changing is the one taken from vitess

So I guess you're worried mostly about this?

If its DDL syntax is compatible with either of MySQL / PostgreSQL, we just need to use ParserModePostgres or ParserModeMysql for ClickHouse too.

However, if it's so much different from both of them, then... welcome to the SQL parser maintenance world! The current implementation is completely forked from vitess, so you don't need to care about compatibility with vitess (but please care MySQL and PostgreSQL).

If you're new to yacc, you need to learn how to write a parser using yacc first. Then you'd be able to easily understand sqlparser/parser.y.

Testing

Is ClickHouse something we can easily run on Travis for third-party projects?
This part is really up to how to run ClickHouse.

This part is about .travis.yml, which will run **/*_test.go you'll need to add. cmd/*/*_test.go files are the integration test which is the most important piece for sustainable maintenance. Please make sure to add the equivalent for ClickHouse.

@chhetripradeep
Copy link
Author

Hi @k0kubun @yudppp Thank you for your detailed response. There are some sql syntax which is new in clickhouse but I think that won't be involved as part of schema migration.

I am also new to clickhouse too but i will try to make a PR as per your suggestions and try to compare with the migration which we have written manually and see if sqldef also generates identical migrations. Thanks a lot once again for your inputs.

@chhetripradeep
Copy link
Author

Hi @k0kubun @yudppp Thank you for your help earlier.

I have a question: ClickHouse sql syntax overlaps with MySQL/PostgreSQL but there are lot of differences, one such example is the data types (clickhouse uses Uint for unsigned int but mysql uses UNSIGNED keyword). I was able to write the clickhouse adaptor and cmd part but now i am stuck because neither of ParserModePostgres and ParserModeMysql works. I guess now i need to learn how to write yacc for clickhouse sql syntax. I need your suggestion.

✔︎ ./clickhousedef -h 172.16.45.165 r0  --export
CREATE TABLE r0.foobar (`datetime` DateTime, `Id` UInt64, `ownerId` UInt32) ENGINE = Null;
✔︎ ./clickhousedef -h 172.16.45.165 r0  --export > /tmp/schema.sql
✔︎ ./clickhousedef -h 172.16.45.165 r0  --dry-run < /tmp/schema.sql
found syntax error when parsing DDL "CREATE TABLE r0.foobar (`datetime` DateTime, `Id` UInt64, `ownerId` UInt32) ENGINE = Null": syntax error at position 57 near 'UInt64'

I would need your suggestion here to know how to proceed further. Thank you.

@k0kubun
Copy link
Collaborator

k0kubun commented Dec 4, 2019

There are roughly two approaches to implement a new DDL syntax:

  1. If syntax is close enough, extend the existing parser and switch differences by ParserMode
  2. If it has too many differences or some syntax-level inconsistency with existing databases, just fork the entire parser and maintain the new parser separately.
    • This option also includes implementation like templating parser.y depending on a target database.

and

ClickHouse sql syntax overlaps with MySQL/PostgreSQL but there are lot of differences, one such example is the data types (clickhouse uses Uint for unsigned int but mysql uses UNSIGNED keyword)

This part can be approached with 1 if differences are not significant, because MySQL and PostgreSQL have difference in data types too and we need to maintain data type switching by ParserMode anyway. However, 2 can be an easier option two, depending on how many data types which we need to newly support.

Could you summarize (1) how many data types you want to add for the initial support and (2) any other differences other than data type names?

@chhetripradeep
Copy link
Author

Hi @k0kubun Thank you for your detailed response on the possible paths.

  1. I would like to add the support for following datatypes: UInt8, UInt16, UInt32, UInt64, FixedString, Array of these (https://clickhouse.yandex/docs/en/data_types/array/) , Tuple of these (https://clickhouse.yandex/docs/en/data_types/tuple/), Enum(https://clickhouse.yandex/docs/en/data_types/enum/) and some nested datatypes too (https://clickhouse.yandex/docs/en/data_types/nested_data_structures/nested/)

  2. Clickhouse supports various kinds of engines: https://clickhouse.yandex/docs/en/operations/table_engines/
    We will need to add support for these engines.

  3. Few of the queries like ATTACH MATERIALIZED VIEW ... DETACH TABLE ....

These are the ones i can see right now. As we dig deeper into the clickhouse syntax, I am pretty sure this list will grow.

@k0kubun
Copy link
Collaborator

k0kubun commented Dec 5, 2019

Thank you! Because we have 2 and 3, I think we have some uncertainty at this moment. So let me propose the following development strategy:

  1. Start building the clickhousedef PoC at https://github.com/sqldef/clickhousedef.
    • This makes PoC a lot easier, because you can change parser as you like, without concerning about mysqldef / psqldef backward compatibility.
    • You can enjoy using working clickhousedef even at this point.
  2. Once we implement 1, 2, and 3 in your comment and we become confident about what ClickHouse support looks like, let's think about either merging these repositories again, or creating better boundary to share more implementation with sqldef if it turns out to be easier to separately maintain the parser.

I already prepared the environment to start the point 1 at https://github.com/sqldef/clickhousedef. You're invited to the organization, and Travis is already enabled so that we can run integration test with clickhousedef there.

@chhetripradeep
Copy link
Author

Hi @k0kubun thank you for your valuable feedback. I will take the path you suggested. I'll let you know once i come up with something.

@k0kubun
Copy link
Collaborator

k0kubun commented Jun 17, 2020

As we moved the repository to clickhousedef, let's discuss things there and let me close this issue in this repository.

@k0kubun k0kubun closed this as completed Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants