Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize(fuse): record scalar column in meta file(or parquet meta)?. #15915

Open
youngsofun opened this issue Jun 27, 2024 · 3 comments
Open

Comments

@youngsofun
Copy link
Member

youngsofun commented Jun 27, 2024

Summary

For a row in a large wide table, many (even most) columns may be null or set to their default values. This table might be loaded using SQL commands like COPY INTO wide_table(c1, c100) FROM ..., while wide_table itself may contain 1000 columns.

In memory, the unused columns are represented as Value::Scalar in DataBlock, which speeds up computation significantly. However, when we translate DataBlock into an Arrow RowBatch, it gets flattened. This results in:

  1. Slower load progress.
  2. When we read the data back, it is represented as Value::Column.

Impact

  • The flattening process during the conversion to Arrow RowBatch introduces performance overhead, causing slower load times.
  • The conversion of unused columns from Value::Scalar to Value::Column during read-back operations can negatively impact performance and resource usage.
@youngsofun youngsofun changed the title optimize(fuse): record scalar column in meta file. optimize(fuse): record scalar column in meta file or parquet meta?. Jun 28, 2024
@youngsofun youngsofun changed the title optimize(fuse): record scalar column in meta file or parquet meta?. optimize(fuse): record scalar column in meta file(or parquet meta)?. Jun 28, 2024
@youngsofun
Copy link
Member Author

cc @dantengsky @zhyass

@dantengsky
Copy link
Member

dantengsky commented Jun 28, 2024

For a row in a large wide table, many (even most) columns may be null or set to their default values. This table might be loaded using SQL commands like COPY INTO wide_table(c1, c100) FROM ...,

It looks like 'alter table t add column c int' or 'alter table t add column c int default 1',

maybe we need not to "materialize" those columns at all?

@youngsofun
Copy link
Member Author

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants