-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](serde) support presto compatible output format #37039
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -397,7 +397,8 @@ void DataTypeMapSerDe::read_column_from_arrow(IColumn& column, const arrow::Arra | |||
template <bool is_binary_format> | |||
Status DataTypeMapSerDe::_write_column_to_mysql(const IColumn& column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function '_write_column_to_mysql' has cognitive complexity of 84 (threshold 50) [readability-function-cognitive-complexity]
Status DataTypeMapSerDe::_write_column_to_mysql(const IColumn& column,
^
Additional context
be/src/vec/data_types/serde/data_type_map_serde.cpp:409: +1, including nesting penalty of 0, nesting level increased to 1
if (0 != result.push_string("{", 1)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:413: +1, including nesting penalty of 0, nesting level increased to 1
for (auto j = offsets[col_index - 1]; j < offsets[col_index]; ++j) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:414: +2, including nesting penalty of 1, nesting level increased to 2
if (j != offsets[col_index - 1]) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:415: +3, including nesting penalty of 2, nesting level increased to 3
if (0 != result.push_string(", ", 2)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:419: +2, including nesting penalty of 1, nesting level increased to 2
if (nested_keys_column.is_null_at(j)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:420: +3, including nesting penalty of 2, nesting level increased to 3
if (0 != result.push_string(serde_info.null_format, serde_info.null_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:423: +1, nesting level increased to 2
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:424: +3, including nesting penalty of 2, nesting level increased to 3
if (is_key_string && serde_info.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:424: +1
if (is_key_string && serde_info.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:425: +4, including nesting penalty of 3, nesting level increased to 4
if (0 !=
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:429: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:429: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:431: +4, including nesting penalty of 3, nesting level increased to 4
if (0 !=
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:435: +1, nesting level increased to 3
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:436: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:436: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:440: +2, including nesting penalty of 1, nesting level increased to 2
if (0 != result.push_string(serde_info.mapkey_delim, serde_info.delim_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:443: +2, including nesting penalty of 1, nesting level increased to 2
if (nested_values_column.is_null_at(j)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:444: +3, including nesting penalty of 2, nesting level increased to 3
if (0 != result.push_string(serde_info.null_format, serde_info.null_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:447: +1, nesting level increased to 2
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:448: +3, including nesting penalty of 2, nesting level increased to 3
if (is_val_string && serde_info.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:448: +1
if (is_val_string && serde_info.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:449: +4, including nesting penalty of 3, nesting level increased to 4
if (0 !=
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:453: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:453: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:455: +4, including nesting penalty of 3, nesting level increased to 4
if (0 !=
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:459: +1, nesting level increased to 3
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:460: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:460: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:465: +1, including nesting penalty of 0, nesting level increased to 1
if (0 != result.push_string("}", 1)) {
^
default: | ||
return Statsu::InternalError("unknown serde dialect: {}", serde_dialect); | ||
} | ||
return Status::OK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'write' exceeds recommended size/complexity thresholds [readability-function-size]
Status VMysqlResultWriter<is_binary_format>::write(RuntimeState* state, Block& input_block) {
^
Additional context
be/src/vec/sink/vmysql_result_writer.cpp:135: 107 lines including whitespace and comments (threshold 80)
Status VMysqlResultWriter<is_binary_format>::write(RuntimeState* state, Block& input_block) {
^
run buildall |
TPC-H: Total hot run time: 40121 ms
|
run buildall |
TPC-H: Total hot run time: 40295 ms
|
TPC-DS: Total hot run time: 173447 ms
|
ClickBench: Total hot run time: 31.03 s
|
run buildall |
TPC-H: Total hot run time: 39645 ms
|
TPC-DS: Total hot run time: 174143 ms
|
ClickBench: Total hot run time: 30.45 s
|
run buildall |
TPC-H: Total hot run time: 39970 ms
|
TPC-DS: Total hot run time: 172893 ms
|
ClickBench: Total hot run time: 30.17 s
|
be/src/common/serde_info.h
Outdated
|
||
// The description of serde format. | ||
// Used for the string format of result return to MySQL client. | ||
struct SerdeInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this struct can put into DataTypeSerDe.FormatOptions.mysql.serdeInfo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we have some option definition can be reuse ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -397,7 +397,8 @@ void DataTypeMapSerDe::read_column_from_arrow(IColumn& column, const arrow::Arra | |||
template <bool is_binary_format> | |||
Status DataTypeMapSerDe::_write_column_to_mysql(const IColumn& column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function '_write_column_to_mysql' has cognitive complexity of 84 (threshold 50) [readability-function-cognitive-complexity]
Status DataTypeMapSerDe::_write_column_to_mysql(const IColumn& column,
^
Additional context
be/src/vec/data_types/serde/data_type_map_serde.cpp:409: +1, including nesting penalty of 0, nesting level increased to 1
if (0 != result.push_string("{", 1)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:413: +1, including nesting penalty of 0, nesting level increased to 1
for (auto j = offsets[col_index - 1]; j < offsets[col_index]; ++j) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:414: +2, including nesting penalty of 1, nesting level increased to 2
if (j != offsets[col_index - 1]) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:415: +3, including nesting penalty of 2, nesting level increased to 3
if (0 != result.push_string(", ", 2)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:419: +2, including nesting penalty of 1, nesting level increased to 2
if (nested_keys_column.is_null_at(j)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:420: +3, including nesting penalty of 2, nesting level increased to 3
if (0 != result.push_string(options.null_format, options.null_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:423: +1, nesting level increased to 2
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:424: +3, including nesting penalty of 2, nesting level increased to 3
if (is_key_string && options.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:424: +1
if (is_key_string && options.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:425: +4, including nesting penalty of 3, nesting level increased to 4
if (0 != result.push_string(options.nested_string_wrapper, options.wrapper_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:428: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:428: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:430: +4, including nesting penalty of 3, nesting level increased to 4
if (0 != result.push_string(options.nested_string_wrapper, options.wrapper_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:433: +1, nesting level increased to 3
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:434: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:434: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(key_serde->write_column_to_mysql(nested_keys_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:438: +2, including nesting penalty of 1, nesting level increased to 2
if (0 != result.push_string(&options.map_key_delim, 1)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:441: +2, including nesting penalty of 1, nesting level increased to 2
if (nested_values_column.is_null_at(j)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:442: +3, including nesting penalty of 2, nesting level increased to 3
if (0 != result.push_string(options.null_format, options.null_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:445: +1, nesting level increased to 2
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:446: +3, including nesting penalty of 2, nesting level increased to 3
if (is_val_string && options.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:446: +1
if (is_val_string && options.wrapper_len > 0) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:447: +4, including nesting penalty of 3, nesting level increased to 4
if (0 != result.push_string(options.nested_string_wrapper, options.wrapper_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:450: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:450: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:452: +4, including nesting penalty of 3, nesting level increased to 4
if (0 != result.push_string(options.nested_string_wrapper, options.wrapper_len)) {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:455: +1, nesting level increased to 3
} else {
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:456: +4, including nesting penalty of 3, nesting level increased to 4
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:626: expanded from macro 'RETURN_IF_ERROR'
do { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:456: +5, including nesting penalty of 4, nesting level increased to 5
RETURN_IF_ERROR(value_serde->write_column_to_mysql(nested_values_column, result, j,
^
be/src/common/status.h:628: expanded from macro 'RETURN_IF_ERROR'
if (UNLIKELY(!_status_.ok())) { \
^
be/src/vec/data_types/serde/data_type_map_serde.cpp:461: +1, including nesting penalty of 0, nesting level increased to 1
if (0 != result.push_string("}", 1)) {
^
run buildall |
TPC-H: Total hot run time: 40633 ms
|
TPC-DS: Total hot run time: 172927 ms
|
ClickBench: Total hot run time: 31.07 s
|
run buildall |
TPC-H: Total hot run time: 41043 ms
|
TPC-DS: Total hot run time: 173350 ms
|
ClickBench: Total hot run time: 29.96 s
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
the output format of some data types are different between Presto/Trino and Doris, especially for complex type such as array, map and struct. When user migrate from Presto to Doris, they expect the same format so that they don't need to modify their business code. This PR mainly changes: 1. Add a new session variable `serde_dialect` Default is `doris`, options are `presto` or `trino`. If set to presto or trino, the output format returned to MySQL client of some datatypes will be changed: - Array Doris: `["abc", "def", "", null]` Presto: `[abc, def, , NULL]` - Map Doris: `{"k1":null, "k2":"v3"}` Presto: `{k1=NULL, k2=v3}` - Struct Doris: `{"s_id":100, "s_name":"abc , "", "s_address":null}` Presto: `{s_id=100, s_name=abc , ", s_address=NULL}` 2. Change the output format of struct type Remove the space after `:` - Before: `{"s_id": 100, "s_name": "abc , "", "s_address": null}` - After: ``{"s_id":100, "s_name":"abc , "", "s_address":null}``
the output format of some data types are different between Presto/Trino and Doris, especially for complex type such as array, map and struct. When user migrate from Presto to Doris, they expect the same format so that they don't need to modify their business code. This PR mainly changes: 1. Add a new session variable `serde_dialect` Default is `doris`, options are `presto` or `trino`. If set to presto or trino, the output format returned to MySQL client of some datatypes will be changed: - Array Doris: `["abc", "def", "", null]` Presto: `[abc, def, , NULL]` - Map Doris: `{"k1":null, "k2":"v3"}` Presto: `{k1=NULL, k2=v3}` - Struct Doris: `{"s_id":100, "s_name":"abc , "", "s_address":null}` Presto: `{s_id=100, s_name=abc , ", s_address=NULL}` 2. Change the output format of struct type Remove the space after `:` - Before: `{"s_id": 100, "s_name": "abc , "", "s_address": null}` - After: ``{"s_id":100, "s_name":"abc , "", "s_address":null}``
the output format of some data types are different between Presto/Trino and Doris,
especially for complex type such as array, map and struct.
When user migrate from Presto to Doris, they expect the same format so that they
don't need to modify their business code.
This PR mainly changes:
Add a new session variable
serde_dialect
Default is
doris
, options arepresto
ortrino
. If set to presto or trino,the output format returned to MySQL client of some datatypes will be changed:
Array
Doris:
["abc", "def", "", null]
Presto:
[abc, def, , NULL]
Map
Doris:
{"k1":null, "k2":"v3"}
Presto:
{k1=NULL, k2=v3}
Struct
Doris:
{"s_id":100, "s_name":"abc , "", "s_address":null}
Presto:
{s_id=100, s_name=abc , ", s_address=NULL}
Change the output format of struct type
Remove the space after
:
{"s_id": 100, "s_name": "abc , "", "s_address": null}
{"s_id":100, "s_name":"abc , "", "s_address":null}