optimize(fuse): record scalar column in meta file(or parquet meta)?. #15915

youngsofun · 2024-06-27T07:00:33Z

Summary

For a row in a large wide table, many (even most) columns may be null or set to their default values. This table might be loaded using SQL commands like COPY INTO wide_table(c1, c100) FROM ..., while wide_table itself may contain 1000 columns.

In memory, the unused columns are represented as Value::Scalar in DataBlock, which speeds up computation significantly. However, when we translate DataBlock into an Arrow RowBatch, it gets flattened. This results in:

Slower load progress.
When we read the data back, it is represented as Value::Column.

Impact

The flattening process during the conversion to Arrow RowBatch introduces performance overhead, causing slower load times.
The conversion of unused columns from Value::Scalar to Value::Column during read-back operations can negatively impact performance and resource usage.

youngsofun · 2024-06-28T03:25:01Z

cc @dantengsky @zhyass

dantengsky · 2024-06-28T04:46:39Z

For a row in a large wide table, many (even most) columns may be null or set to their default values. This table might be loaded using SQL commands like COPY INTO wide_table(c1, c100) FROM ...,

It looks like 'alter table t add column c int' or 'alter table t add column c int default 1',

maybe we need not to "materialize" those columns at all?

youngsofun · 2024-06-28T06:36:49Z

yes

youngsofun changed the title ~~optimize(fuse): record scalar column in meta file.~~ optimize(fuse): record scalar column in meta file or parquet meta?. Jun 28, 2024

youngsofun changed the title ~~optimize(fuse): record scalar column in meta file or parquet meta?.~~ optimize(fuse): record scalar column in meta file(or parquet meta)?. Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize(fuse): record scalar column in meta file(or parquet meta)?. #15915

optimize(fuse): record scalar column in meta file(or parquet meta)?. #15915

youngsofun commented Jun 27, 2024 •

edited

Loading

youngsofun commented Jun 28, 2024

dantengsky commented Jun 28, 2024 •

edited

Loading

youngsofun commented Jun 28, 2024

optimize(fuse): record scalar column in meta file(or parquet meta)?. #15915

optimize(fuse): record scalar column in meta file(or parquet meta)?. #15915

Comments

youngsofun commented Jun 27, 2024 • edited Loading

youngsofun commented Jun 28, 2024

dantengsky commented Jun 28, 2024 • edited Loading

youngsofun commented Jun 28, 2024

youngsofun commented Jun 27, 2024 •

edited

Loading

dantengsky commented Jun 28, 2024 •

edited

Loading