-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support complex types in Arrow/Parquet/ORC #24341
Conversation
@Mergifyio update |
Command
|
I am also going to add Map type support in this PR. |
8455492
to
32715b7
Compare
@Mergifyio update |
Command
|
Thanks so much for working on this. I tried this PR as I thought it could help us to load some of our Parquet files into ClickHouse. The table:) CREATE TABLE t ( v Nested(a String, b String) ) ENGINE=Memory() An array of structs» parquet-tools schema f1
message schema {
required group v (LIST) {
repeated group list {
required group v {
optional binary a (STRING);
optional binary b (STRING);
}
}
}
} » ch --query="INSERT INTO t FORMAT Parquet" < f1
Code: 8. DB::Exception: Column "v.a" is not presented in input data.: data for INSERT was parsed from stdin A struct of arrays» parquet-tools schema f2
message schema {
optional group v {
required group a (LIST) {
repeated group list {
required binary a (STRING);
}
}
required group b (LIST) {
repeated group list {
required binary b (STRING);
}
}
}
} » ch --query="INSERT INTO t FORMAT Parquet" < f2
Code: 8. DB::Exception: Column "v.a" is not presented in input data.: data for INSERT was parsed from stdin This could be worked around by transforming the data using |
@Mergifyio update |
Command
|
@buyology thank you for you comment, I will try to support inserting into Nested by struct of arrays or arrays of struct in the next PR. |
AST fuzzer failure: #25293 |
@Mergifyio update |
Command
|
Internal documentation ticket: DOCSUP-10557. |
Continued here: #36832 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support structs and maps in Arrow/Parquet/ORC and dictionaries in Arrow input/output formats. Present new setting
output_format_arrow_low_cardinality_as_dictionary
.Detailed description / Documentation draft:
Support Struct and Map types in input/output column-oriented formats Arrow/Parquet/ORC. Now you can input/output ClickHouse Tuples and Maps (experimental type) in these formats. Nested complex types are also supported. If setting
output_format_arrow_low_cardinality_as_dictionary
is true, LowCardinality columns will be converted into dictionary Arrow column, if false, LowCardinality columns will be converted to full column before output. By default this setting is false.Closes #17240 and #21866