Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go][Parquet] Inconsistency in RowGroup 'total byte size' field values between implementations #44205

Closed
abbit opened this issue Sep 23, 2024 · 0 comments

Comments

@abbit
Copy link

abbit commented Sep 23, 2024

Describe the usage question you have. Please include as many useful details as possible.

parquet.thrift in parquet-format repo describes RowGroup total_byte_size field meaning as

Total byte size of all the uncompressed column data in this row group

This is also the case for C++ implementation of parquet in this repo.

But in case of Go implementation total_byte_size is described as

TotalByteSize is the total size of this rowgroup on disk

The difference between these values can be large, when compression is applied to column chunks.

My question is: Is that intentional inconsistency with format definition and other implementations? And if so, why does this distinction has been made?

Component(s)

Go, Parquet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant